## Project: Data Cleaning with Pandas

#### Below will be a series of user stories, followed by an empty Python code block
* These user stories will go through the process of importing, cleaning, and exporting the included `dirty_cars_dataset.csv` file
* Be sure to read each question carefully, and to test and debug your code to ensure the user story is completed correctly!


### As a Data Analyst, I want to set up the proper imports so I have access to the Pandas library

In [None]:
import pandas as pd


### As a Data Analyst, I want to import and store the `dirty_cars_dataset.csv` file in a variable
✅ I want to use the `index` column from this .csv file as the `index column` of my DataFrame

In [None]:
df = pd.read_csv("dirty_cars_dataset.csv",index_col="index")
df.head

### As a Data Analyst, I want to view the **information** about my new DataFrame to answer the following questions:
##### Enter your responses in the Markdown block below

* How many entries are in this DataFrame: 64
* How many columns are in this DataFrame:  9
* Which column(s) contain null values in this DataFrame: price

In [None]:
df.info()


### As a Data Analyst, I want to to remove any null values from the DataFrame
✅ I want to **create a new DataFrame variable** when I remove these null values<br>
✅ Then, I want to display the **information** about my new DataFrame, to confirm the null values were successfully removed

In [None]:
nonnull_df =df.dropna()
nonnull_df.info ()

### As a Data Analyst, I want to check if there are any **duplicate rows** within my DataFrame

In [None]:
nonnull_df.duplicated()

### As a Data Analyst, I want to **remove** any duplicate values from the DataFrame
✅ I want to **create a new DataFrame variable** when I remove these duplicate values<br>
✅ I want to again check if there are any duplicate rows within my DataFrame, to ensure the values were removed successfully

In [None]:
nodup_df =nonnull_df.drop_duplicates()
nodup_df.duplicated().sort_values()

### As a Data Analyst, I want to ensure I remove any outlier values from my DataFrame to avoid inaccurate analysis of my data
✅I want to **create a new DataFrame variable** when I remove these values<br><br>
💡 **Hint:** These inaccuracies will be within the `price` column 💡<br><br>
💡 **Hint** There will be both **high** and **low** outlier values 💡

In [None]:
# Removing high outlier values
sorted_df = nodup_df.sort_values("price")
sorted_df.head()
houtlier_df = sorted_df.drop(53)
houtlier_df.head()

In [None]:
# Removing low outlier values
lsorted_df = houtlier_df.sort_values("price")
lsorted_df.tail()
loutlier_df = lsorted_df.drop(2)
loutlier_df.tail()

### As a Data Analyst, I want to reformat the **company** series, ensuring all company name values are properly title (Pascal) cased

In [None]:
loutlier_df["company"] = loutlier_df["company"].str.title()
loutlier_df.head()

### As a Data Analyst, I want to create a ***new*** column on my DataFrame to represent the **price of each car in Euros**
💡 **Use the conversion rate 1.05 USD == 1 Euro** 💡

In [None]:
loutlier_df["price_euro"] = loutlier_df["price"]*1.05
loutlier_df.head()

### As a Data Analyst, I want to rename the existing **price** column to show that it represents **price in USD**

In [None]:
renamed_df = loutlier_df.rename(columns={"price":"pirce_usd"})
renamed_df.head()

### As a Data Analyst, I want to output my cleaned DataFrame as a .csv file
✅ I want to name this file `cleaned_cars_dataset.csv`<br>
✅ I want to specify the encoding type 'utf-8'<br>
✅ I want to include this .csv file in my GitHub repository

In [None]:
renamed_df.to_csv("clean_cars_dataset.csv",encoding="utf-8")