<a href="https://colab.research.google.com/github/prof-rossetti/intro-to-python/blob/main/exercises/csv-processing/CSV_Processing_Exercise_(Spring_2024).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions

Each of these challenges provides a different CSV formatted dataset. Write Python code to process each file and answer the respective questions about the data contained within.



# References

You will find the following reference material directly helpful in completing this exercise.

Working with CSV files:

  + [`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
  + [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html):
    + [`pandas.DataFrame.iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html)
    + [`pandas.DataFrame.groupby()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)
  + [`pandas.Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html):
    + [`pandas.Series.map()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html)
    + [`pandas.Series.value_counts()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)
  + [`pandas.pivot_table()`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)



    




# Challenges

## Challenge 1 (Gradebook)


Given the provided "gradebook.csv" file, write Python code to read the CSV data and perform each of the following tasks...

A) Print the **column names** (i.e. `['student_id', 'final_grade']`).

B) Print the **number of students** / rows (i.e. `10`).

C) Print the **average grade** (i.e. `83.64`).

D) Print the **median grade** (i.e. ` 87.6`).

E) Create another column on the DataFrame called "letter_grade", and use a custom function to assign each numeric score a corresponding **letter grade**.

> HINT: use the Series object's `apply()` or `map()` method


F) Print the **percentage of students who got each letter grade**.

> HINT: use the Series object's `value_counts()` method

In [None]:
from pandas import read_csv

#
# GRADEBOOK
#

#df = read_csv("gradebook.csv")
df = read_csv("https://raw.githubusercontent.com/prof-rossetti/intro-to-python/master/data/gradebook.csv")

print(df.head())

   student_id  final_grade
0           1         76.7
1           2         85.1
2           3         50.3
3           4         89.8
4           5         97.4


In [None]:

# HERE IS AN EXAMPLE FUNCTION TO USE FOR LETTER GRADE CONVERSION (FEEL FREE TO REVISE AS DESIRED)

def calculate_grade(score):
    if score >= 75.0:
        grade = "Pass"
    else:
        grade = "Fail"
    return grade


# example invocations:
print(calculate_grade(50))
print(calculate_grade(90))

Fail
Pass


## Challenge 2 (Products)

Given the provided "products.csv" file, write Python code to read the CSV data and perform each of the following tasks...

A) Print the **number of products** (rows) (i.e. `20`).

B) Print the **columns** (i.e. `['id', 'name', 'aisle', 'department', 'price']`).

B) Loop through each of the rows and **print the name and price of each product**. Use the provided `to_usd` function to format prices as USD with dollar sign and two decimal places.


In [None]:

# HERE IS A PRICE FORMATTING FUNCTION FOR YOU TO USE

def to_usd(my_price):
    """
        Converts a numeric value to USD-formatted string, for printing and display purposes.
        Adds dollar sign and commas for the thousands separator.
        Rounds to two decimal places.

        Param: my_price (int or float or str) like 4000.444444 or "4000.444444"

        Example: to_usd(4000.444444)

        Returns: $4,000.44
    """
    return f"${float(my_price):,.2f}"


# example invocations:
print(to_usd(4.5))
print(to_usd(1234567890.12345))

$4.50
$1,234,567,890.12


In [None]:
#
# PRODUCTS
#

from pandas import read_csv

#df = read_csv("products.csv")
df = read_csv("https://raw.githubusercontent.com/prof-rossetti/intro-to-python/master/data/products.csv")

print(df.head())

   id                                               name  \
0   1                         Chocolate Sandwich Cookies   
1   2                                   All-Seasons Salt   
2   3               Robust Golden Unsweetened Oolong Tea   
3   4  Smart Ones Classic Favorites Mini Rigatoni Wit...   
4   5                          Green Chile Anytime Sauce   

                        aisle department  price  
0               cookies cakes     snacks   3.50  
1           spices seasonings     pantry   4.99  
2                         tea  beverages   2.49  
3                frozen meals     frozen   6.99  
4  marinades meat preparation     pantry   7.99  


## Challenge 3 (Monthly Sales)


Given the provided `sales_df` variable representing monthly retail sales, write Python code to read the CSV data and perform each of the following tasks...


A) What is the **structure** of this data. In other words, we have a "row per what?" Describe your answer in words.


B) How many **unique products** are sold (i.e. `7`), and what are their names? Print the list of unique products in alphabetical order (i.e. `['Baseball Cap', 'Brown Boots', 'Button-Down Shirt', 'Khaki Pants', 'Sticker Pack', 'Super Soft Hoodie', 'Vintage Logo Tee']`).

C) Print the **total monthly sales**, formatted as USD (i.e. `"$12,000.71"`).


D) Calculate the **total sales for each day**, and create a bar or line chart depicting the sales over time. Optionally  also print which five dates have the most sales.



<img src="https://user-images.githubusercontent.com/1328807/211162483-1418bd1c-7e43-42bc-b2c4-b5c24e242f0a.png" height="300"/>




E) Determine the **total sales for each product**, and create a horizontal bar chart to show the top selling products, with the bars sorted in descending order of their length.




<img src="https://user-images.githubusercontent.com/1328807/211162481-07593a51-57f9-4bfd-ab14-f0878d8bc960.png" height="300"/>






In [None]:
#
# MONTHLY SALES
#

month = "201803"
sales_df = read_csv(f"https://raw.githubusercontent.com/prof-rossetti/data-analytics-in-python/main/data/unit-2/monthly-sales/sales-{month}.csv")
sales_df.head()

Unnamed: 0,date,product,unit price,units sold,sales price
0,2018-03-01,Button-Down Shirt,65.05,2,130.1
1,2018-03-01,Vintage Logo Tee,15.95,1,15.95
2,2018-03-01,Sticker Pack,4.5,1,4.5
3,2018-03-02,Super Soft Hoodie,75.0,2,150.0
4,2018-03-02,Button-Down Shirt,65.05,7,455.35


Answer for A (fill in blanks):

Looks like we have a row per _____________ per __________.

## Challenge 4 (Albums)


Given the provided "albums.csv" file, write Python code to read the CSV data and perform each of the following tasks...

> artist_id | artist_name | album_id | album_title
> --- | ---  | ---  | ---
> 1 | AC/DC | 1 | For Those About To Rock We Salute You
> 1 | AC/DC | 4 | Let There Be Rock
> 3 | Aerosmith | 5 | Big Ones
> 4 | Alanis Morissette | 6 | Jagged Little Pill
> 68 | Miles Davis | 48 | The Essential Miles Davis [Disc 1]
> 68 | Miles Davis | 49 | The Essential Miles Davis [Disc 2]
> 59 | Santana | 197 | Santana - As Years Go By
> 59 | Santana | 198 | Santana Live



A) Print a list of the CSV file's **column names** (i.e. `['artist_id', 'artist_name', 'album_id', 'album_title']`).

B) Print the **number of rows** in the CSV file, excluding the header row (i.e. `347`).

C) Assuming each "artist_id" represents a unique artist, print the **number of unique artists** (i.e. `204`).

D) Assuming each "album_id" represents a unique album, identify **which five artists have the most albums**, and print that artist's name and corresponding album count

> FYI: the following table depicts the top five artists by album count...
>
> | artist_name   | album_count |
> |---------------|----------|
> | Iron Maiden   | 21       |
> | Led Zeppelin  | 14       |
> | Deep Purple   | 11       |
> | Metallica     | 10       |
> | U2            | 10       |




In [None]:

#
# ALBUMS
#

df = read_csv("https://gist.githubusercontent.com/s2t2/f2b01347c06258cad28e7331f1e9320f/raw/b61ea33cd18aaadf885eb04544d921c9d9445e21/albums.csv")
print(df.head())

   artist_id artist_name  album_id                            album_title
0          1       AC/DC         1  For Those About To Rock We Salute You
1          2      Accept         2                      Balls to the Wall
2          2      Accept         3                      Restless and Wild
3          1       AC/DC         4                      Let There Be Rock
4          3   Aerosmith         5                               Big Ones


## Challenge 5 (Pokemon)


Given the provided ["pokemon-gen-1.csv"](https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/pokemon-gen-1.csv) file, write Python code to read the CSV data and perform each of the following tasks...



a) How many Pokemon are there?

b) Loop through the Pokemon and print the name of each.

> HINT: use `.iterrows()` to loop through the rows

c) Which five Pokemon have the highest base experience? Display their names and base experience and types.

> HINT: use `.sort_values()` to sort the rows by base experience in descending order, then take the first five. Then display the specific info / columns we want.

d) Filter the data to find a Pokemon with a name of "Pikachu". Display information about that pokemon, ideally as a dictionary. Then also display an image of this Pokemon, using the image url.

> HINT: after filtering, use `iloc[0]` to grab the first row, then convert it to a dictionary.

e) How many Pokemon are "Electric" type? What are their names?

> HINT: use `.str.contains()` for the filter condition


In [None]:
from pandas import read_csv

pokemon_url = "https://raw.githubusercontent.com/prof-rossetti/intro-to-python/main/data/pokemon-gen-1.csv"

pokemon_df = read_csv(pokemon_url)
pokemon_df.head()

Unnamed: 0,name,base_experience,height,weight,types,abilities,held_items,attack,defense,special_attack,special_defense,speed,image_url,cry_url
0,Chansey,395,11,346,Normal,"Natural-Cure, Serene-Grace, Healer","Oval-Stone, Lucky-Egg, Lucky-Punch",5,5,35,105,50,https://raw.githubusercontent.com/PokeAPI/spri...,https://raw.githubusercontent.com/PokeAPI/crie...
1,Mewtwo,340,20,1220,Psychic,"Pressure, Unnerve",,110,90,154,90,130,https://raw.githubusercontent.com/PokeAPI/spri...,https://raw.githubusercontent.com/PokeAPI/crie...
2,Dragonite,300,22,2100,"Dragon, Flying","Inner-Focus, Multiscale","Dragon-Scale, Dragon-Fang",134,95,100,100,80,https://raw.githubusercontent.com/PokeAPI/spri...,https://raw.githubusercontent.com/PokeAPI/crie...
3,Mew,300,4,40,Psychic,Synchronize,Lum-Berry,100,100,100,100,100,https://raw.githubusercontent.com/PokeAPI/spri...,https://raw.githubusercontent.com/PokeAPI/crie...
4,Articuno,290,17,554,"Ice, Flying","Pressure, Snow-Cloak",,85,100,95,125,85,https://raw.githubusercontent.com/PokeAPI/spri...,https://raw.githubusercontent.com/PokeAPI/crie...
