# Data cleaning challenge: which product do people like best?

In this challenge, you will take the role of a data scientist. You'll be given some data on customer reviews for 3 products (Products A, B, and C) and you'll have to clean it to be able to run your company's graphing code to see which product is best.

### Necessary files:
* There is a file in the `datasets` folder called 'product_tests.csv'. This contains data from 100 customer ratings each of Products A, B, and C. Each customer has a unique user id and rated one of the products on a scale from 0-5. (0 is the worst, 5 is the best) 
* There is a script that runs your company's graphing code called `compare_products.py`. This script will make a graph to help figure out which product customers like best. **This script reads in a file called 'products_clean.csv' in the datasets folder. Your overall job is to clean the data to make this file!**


**First, import the `product_tests.csv` file using pandas and assign it to a variable** (remember to import pandas too)

In [None]:
import pandasas pd

In [None]:
pd.read_csv('../..')

### Your data cleaning goals:

Your goal is to make this 'products_clean.csv' file a cleaned datafile. Here are the steps you should take to make sure the data are clean

1. Remove any rows where ratings (values in the `rating` column) are below 0 or above 5. These would be impossible scores so these should be removed.

2. There are some rows where the user_id is missing. Replace these with the string 'unknown user' for each missing user_id. We don't know the user id, but maybe we can still analyze these data points!

3. Filter out any rows where `product` or `rating` are missing. We can't analyze data if we don't know which product it was, or what the rating was!

4. Rename the `rating` column to `user_rating` and the `product` column to `product_id`. The company's code is built to use these standardized column names

5. Once you've done all these steps, export the data to `jtc_class_code/datasets/products_clean.csv`

Make sure that the csv is named exactly this way in your folder, because the graphing code relies on this exact file path!

### Comparing the products

Once you've finished, run:
```console 
$ python compare_products.py
``` 

from the command line, and if the code runs smoothly, you'll see a file called `product_chart.png` pop up to help you decide which product customers like best. 

Which product do you think is highest-rated?

If you don't get it on the first try, don't worry! Try to use the error messages you see, and take a look at your `products_clean.csv` file to see what is being output to help you guide your data cleaning process 

## Finished and got the plot? Decided which product is highest-rated? 

#### Congrats on finishing the data cleaning challenge! Data cleaning is not easy! 

So, remember to comment all your code and push this notebook to Github