# Spring Cleaning!

It's that time of year again: the end of the fiscal year and the time to begin financial spring cleaning! Within a month, auditors will be camped out at your investment firm, inspecting everyone's trades and the company's end-of-year financial statements. All of the traders in the firm are under a lot of pressure to finalize their portfolio earnings and deliver them to their managers. That is, all traders except you.

You automated your end-of-year financial reporting last week, and now you're using the pipeline to help out Harold with his reports. Before loading Harold's stock ticker data into Pandas, you open the Excel file he sent you to look at the quality of the data. You realize that Harold has not subscribed to any data quality standards and that the data is a mess.

For this activity, use Pandas to clean Harold's portfolio data to get it fit for use.

## Instructions

Using the [starter file](Unsolved/Core/spring_cleaning.ipynb) and Harold's financial [data](Resources/stock_data.csv), complete the following steps.

1. Load CSV data into Pandas using `read_csv`.

2. Identify the number of rows and columns in the DataFrame, otherwise known as its shape/structure.

3. Generate a sample of the data to visually ensure data has been loaded in correctly.

4. Identify the number of records in the DataFrame, and compare it with the number of rows in the original file.

5. Identify null records by calculating average percent of nulls for each Series. **Hint:** This step will require the `mean` function.

6. Drop null records.

7. Validate all nulls have been dropped by calculating the `sum` of values that are null.

8. Default null `ebitda` values to 0.

9. Check that there are no null `ebitda` values using the `sum` function.

10. Remove duplicate rows.

## Challenge

Complete this challenge using the [starter file](Unsolved/Challenge/spring_cleaning.ipynb).

Now that nulls and duplicates have been wrangled, clean up the data a little more by removing the `$` currency symbols from the `price` field. Then, use the `astype` function to cast `price` to a `float`.

## Hint

Pandas offers a `replace` function that can be executed against a Series. Documentation can be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.replace.html).


Harold's stock data is a mess! Help him clean up his data before the auditors arrive!

In [2]:
# Import Libraries
import pandas as pd


### Load CSV data into Pandas using `read_csv`

In [3]:
csv_data = pd.read_csv('stock_data.csv')
csv_data

Unnamed: 0,symbol,name,sector,price,price_per_earnings,dividend_yield,earnings_per_share,52_week_low,52_week_high,market_cap,ebitda,price_per_sales,price_per_book,sec_filings
0,MMM,3M Company,Industrials,$222.89,24.31,2.332862,$7.92,259.770,175.49,1.387211e+11,9.048000e+09,4.390271,11.34,http://www.sec.gov/cgi-bin/browse-edgar?action...
1,AOS,A.O. Smith Corp,Industrials,,,,,,,,,,,
2,ABT,Abbott Laboratories,Health Care,56.27,22.51,1.908982,0.26,64.600,42.28,1.021210e+11,5.744000e+09,3.740480,3.19,http://www.sec.gov/cgi-bin/browse-edgar?action...
3,ABBV,AbbVie Inc.,Health Care,108.48,19.41,2.499560,3.29,125.860,60.05,1.813863e+11,1.031000e+10,6.291571,26.14,http://www.sec.gov/cgi-bin/browse-edgar?action...
4,ATVI,Activision Blizzard,Information Technology,65.83,,0.431903,1.28,74.945,38.93,5.251867e+10,2.704000e+09,10.595120,5.16,http://www.sec.gov/cgi-bin/browse-edgar?action...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
499,XYL,Xylem Inc.,Industrials,70.24,30.94,1.170079,1.83,76.810,46.86,1.291502e+10,7.220000e+08,2.726209,5.31,http://www.sec.gov/cgi-bin/browse-edgar?action...
500,YUM,Yum! Brands Inc,Consumer Discretionary,76.3,27.25,1.797080,4.07,86.930,62.85,2.700330e+10,2.289000e+09,6.313636,212.08,http://www.sec.gov/cgi-bin/browse-edgar?action...
501,ZBH,Zimmer Biomet Holdings,Health Care,115.53,14.32,0.794834,9.01,133.490,108.17,2.445470e+10,2.007400e+09,3.164895,2.39,http://www.sec.gov/cgi-bin/browse-edgar?action...
502,ZION,Zions Bancorp,Financials,50.71,17.73,1.480933,2.6,55.610,38.43,1.067068e+10,0.000000e+00,3.794579,1.42,http://www.sec.gov/cgi-bin/browse-edgar?action...


### Identify the number of rows and columns (shape) in the DataFrame.

### Generate a sample of the data to visually ensure data has been loaded in correctly.

### Identify the number of records in the DataFrame, and compare it with the number of rows in the original file.

### Identify nulls records

### Drop Null Records

### Validate nulls have been dropped

### Default null `ebitda` values to 0. Then, validate no records are null for ebitda.

### Drop Duplicates

### Sample `price` field

### Clean `price` Series by replacing `$`

### Confirm data type of `price`

### Cast `price` Series as float and then validate using `dtype`