# Tech Bytes 1: Introduction to Python

Welcome to the first of our Tech Bytes! This time we will take a look on how can we get data from the internet using an API, how can we do some analysis with a library named `pandas` and how can we plot a few things from the data!

But before we get started we have to make sure that your environment is ready to go. We are currently using something called a `Jupyter notebook` which is an interactive way to deal with python. this notebook is being hosted on `Google Colab` that allows you to be up and running with python without having to install anything.

In order to run the code, you can simply click the play button next to the code or select the cell and press ctrl+enter.

To make sure that you are ready for the workshop run the next two cells.

The first one should just appear `Hello NNF` below the cell once you run it.

The second one should print `You are ready to go!`

In case any of them fail or you don't see any output, reach out to Erick before the workshop :)!


In [None]:
greeting = "Hello "
name = "NNF!"

print(greeting + name)

In [None]:
import requests
import pandas as pd
import matplotlib.pyplot as plt


print("You are ready to go!")

# STOP!

You are ready to go, my friend. Don't go any further if you don't want to spoil yourself ;)

We'll see you in the workshop!

# Requesting data

The internet is based upon requests. Every time that you go to a webpage from your web browser there's a `request` made to the website asking for content and you get an answer. It can be an `.html` page but it can also be a `.json` file or other.

There are different types of requests such as `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, etc... but talking about their differences is out of the scope of this workshop. It suffice for you to know that if you want to get data you should use `GET` .

For example, when you click on your browser go to `https://google.com`, your browser send a `GET` request to google and it receives an `.html` page. Then your browser renders it and makes it look nice and presents it to you.

## Our little project: Taylor Swift

Believe when I say I tried to find a dataset that would be more "appropiate" but this one the funniest and easiest API I could find.
So, we're going to work with the Taylor Swift API.

This API has three endpoints (think of them as pages in a website):
- /albums
- /songs
- /lyrics

We're going to use the requests library to interact with the API.

Let's start with the songs endpoint:

In [None]:
BASE_URL = "https://taylor-swift-api.sarbo.workers.dev"

resp_songs = requests.get(BASE_URL + "/songs")

songs_json = resp_songs.json()

print(songs_json[0])

That piece of code fetches a list of Taylor Swift songs from an API and prints the first song in the list.

Let's do the same thing for the album endpoint.

In [None]:
resp_albums = requests.get(BASE_URL + "/albums")

albums_json = resp_albums.json()
print(albums_json[0])

Now we will put this data into a pandas dataframe. What is a dataframe? It is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like an Excel spreadsheet.


In [None]:
df = pd.DataFrame(songs_json)
df.head()

The head function allows you to see the first 5 rows of the dataframe.
Now you get a better idea of how our data looks like.

## Exercise

How many songs does Taylor Swift have according to the dataset?

Tip: Ask ChatGPT something like `I have a dataframe with songs data. How can I find the number of songs in the dataset?`

In [25]:
# Your code goes here



Let's do the same for the albums data

In [None]:
df_albums = pd.DataFrame(resp_albums.json())

df_albums.head()

# Exercise
How many albums does Taylor Swift have?

You shouldn't need chatGPT for this one but it's ok if you do. This is a safe space <3

In [27]:
# Your code goes here


# Merging data

Now we have two dataframes with different data but if you look at the data you will see that there is a common column that is album_id. We can merge these two dataframes on this column.
This will give us a single dataframe with all the data combined.

But how can we do this? Let's see:

In [None]:
df = pd.merge(df, df_albums, 
              left_on="album_id", 
              right_on="album_id", 
              suffixes=("_song", "_album"))

df.head()

Does this looks like dark magic to you? It does to me. 
You want to know how I did it? 

Well I asked chatGPT something like:
```text
I have a dataframe called df with the following columns: song_id, album_id, song_name, album_name. I want to merge it with another dataframe called df_albums with the following columns: album_id, album_name, release_date. How can I merge them?
```

and it gave me the code :D!

Let's clean up the data a little bit

In [29]:
# Remove the album_id column because we don't need it anymore
df = df.drop(columns=["album_id"])

## Exercise

Remove the columns `song_id` and `artist_id` from the DataFrame `df` because we don't need them anymore.

In [30]:
# Your code goes here



## Technicalities

We need to transfrom the `release_date` column to a datetime object. This will allow us to work with the dates in a more convenient way.
These are mainly details, so you don't have to worry too much about them. Just run the next cell :)

In [None]:
# Transform the release_date column to a datetime object
df["release_date"] = pd.to_datetime(df["release_date"])

df.head()

# Exporting to csv
You may think, oh! but how can I share this with my friends that only use excel?
Well, fear not my friend, pandas have you cover:

In [None]:
df.to_csv("taylor_swift_songs.csv", index=False)

You can download the file and open it on excel if you want to :)

# Plotting!

Now we get to the point where we will have nice images from our data. We will use the matplotlib library for this.

Let's start by plotting the number of songs per album.

In [None]:
# Plot number of songs per album
df["title_album"].value_counts().plot(kind="bar")

That was super nice because pandas has an in-built plot function but you can also do it more manually by using a library like `matplotlib`.
This gives you more functionality control of colors, etc.

In [None]:
# Plot number of songs per album using matplotlib

album_value_counts = df["title_album"].value_counts()
print(album_value_counts)

# Make the index written vertically
plt.xticks(rotation=90)
# Making a title
plt.title("Number of songs per album")
# Making the x-axis label
plt.xlabel("Album")
# Making the y-axis label
plt.ylabel("Number of songs")

# Making the bar different colors and different width
plt.bar(album_value_counts.index, 
        album_value_counts, 
        color=['red', 'blue', 'purple', 'green', 'lavender', 'pink'])


Let's add a column that tells you in which month the album was released

In [None]:
df["month_release_date"] = df["release_date"].dt.month

df.head()

## Albums per month

The following code first groups the data by month_release_date and counts the unique album_id in each month to determine how many albums were released. Then, it creates a bar plot displaying this information.

In [None]:

# Group by 'month_release_date' and count distinct 'title_album'
album_per_month = df.groupby('month_release_date')['title_album'].nunique()

# Plotting the bar chart
plt.figure(figsize=(10,6))
album_per_month.plot(kind='bar')

plt.title('Number of Albums Released per Month')
plt.xlabel('Month')
plt.ylabel('Number of Albums')
plt.xticks(rotation=45)
plt.show()

# Exercise

Plot the number of albums released per day of the month.

Can you do it for the day of the week also?

In [None]:
# Your code goes here
