---
---

## **Expert Challenge 🦸‍♀️**

### **Plot E=mc^2**
E = mc^2, equation in German-born physicist Albert Einstein's theory of special relativity that expresses the fact that mass and enery are the same physical entity and can be changed into each other. It is one of the famous physics formula that has been recognised by everyone and its contributions to the field of nuclear and particle physics are just tremendous. 

**For this exercise, you will have to plot a E=mc^2 graph based on the given requirements**

---
**TASK: import the necessary libraries**

Run the code below to import the libraries that will be used for this challenge.

In [None]:
import matplotlib.pyplot as plt

---
**TASK: Create a dataset to plot the graph (2 points)**

You will have to plot E=mc^2 graph for m, which ranges from integer 0 to 10. Create **two** lists to store the values of **m**, and the energy, **E** which calculated with the formula E=mc^2. Note that the speed of light, c used in this formula is 3^(10^8) ms-1.

Your lists should have these values:

m - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  
E - [0, 90000000000000000, 180000000000000000, 270000000000000000, 360000000000000000, 450000000000000000, 540000000000000000, 630000000000000000, 720000000000000000, 810000000000000000, 900000000000000000]

In [None]:
# CODE HERE

In [None]:
# SOLUTION

m = list(range(0, 11, 1))
c = 3*10**8
E = []

for i in range(len(m)):
    E.append(m[i]*c**2)

---
**TASK: Recreate the graph below which maps out E=mc^2 (2 points)**  

By using the list m and E that you have made from previous task, plot the graph using **matplotlib**.

![Physics_Graph_1.png](Assets/Physics_Graph_1.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

plt.plot(m,E,color='red',lw=5)
plt.title("E=mc^2")
plt.xlabel("Mass in Grams")
plt.ylabel("Energy in Joules")
plt.show()

---
**TASK: Plot E=mc^2 on a logarithmic scale on the y axis (3 points)**  

Using **matplotlib**, plot the logarithmic graph as shown below.

Tips:
1. Search about yscale() to set the logarithmic scale
2. Search about grid() to set the grid of the plotted graph same as the picture  

![Physics_Graph_2.png](Assets/Physics_Graph_2.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

plt.plot(m,E,color='red',lw=5)
plt.title("E=mc^2")
plt.xlabel("Mass in Grams")
plt.ylabel("Energy in Joules")
plt.xlim(0,10)

# LOG SCALE
plt.yscale("log")
plt.grid(which='both',axis='y')

plt.show()

---

### **Plot Inflation Graph**
In economics, inflation is the rate of increase in prices over a given period of time. Inflation is typically a broad measure, such as the overall increase in prices or the increase in the cost of living in a country. For example, an inflation rate of 5% per year means that if your shopping costs you $100 today, it would have cost you about only $95 a year ago. Generally, too much inflation is generally considered bad for an economy, while too little inflation is also considered harmful. This is because economists believe that a small amount of inflation can help drive economic growth. Thus, the acceptable inflation rate is set around 2% or a bit below.

**For this exercise, we will give you the data to plot a graph of the inflation rates between Malaysia and World Average**

---
**TASK: Run the cell below to create the lists for plotting**  

Below are the data that will be used for the plotting, run the cell to initialise the lists.

In [None]:
labels = list(range(2012, 2022, 1))
malaysia = [1.66, 2.11, 3.14, 2.10, 2.09, 3.87, 0.88, 0.66, -1.14, 2.48]
world = [3.73, 2.62, 2.35, 1.43, 1.55, 2.19, 2.44, 2.19, 1.92, 3.42]

---
**TASK: Recreate the graph below (3 points)**  

Using **matplotlit**, create two separated line graphs in which the first graph shows Malaysia's inflation rate, whereas the second graph shows World Average's inflation rate. 

Tips:
1. Set figure size to (12,8)

![Expert_Challenge_Plot_1.png](Assets/Expert_Challenge_Plot_1.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

fig,axes = plt.subplots(nrows=2,ncols=1,figsize=(12,8))

axes[0].plot(labels, malaysia)
axes[0].set_title("Malaysia Inflation Rate")
axes[1].plot(labels,world)
axes[1].set_title("World Average Inflation Rate")

plt.show()

---
**TASK: Try to recreate the plot below that uses twin axes (7 points)**  

Using **matplotlib**, try to find a way that could plot different axes on the same graph.

Tips:
1. Graph configuration 
    - Figure size: (12,8)
    - Plotting line width: 2
    - Spine line width: 4
    - Label font size: 18
    - Y ticks font size: 15

2. Search twinx() and spines  

![Expert_Challenge_Plot_2.png](Assets/Expert_Challenge_Plot_2.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

fig, ax1 = plt.subplots(figsize=(12,8))

ax1.plot(labels,malaysia, lw=2, color="blue")
ax1.set_ylabel("Malaysia", fontsize=18, color="blue")

ax1.spines['left'].set_color('blue')
ax1.spines['left'].set_linewidth(4)

for label in ax1.get_yticklabels():
    label.set_color("blue")
plt.yticks(fontsize=15)    
    
ax2 = ax1.twinx()
ax2.plot(labels,world, lw=2, color="red")
ax2.set_ylabel("World Average", fontsize=18, color="red")

ax2.spines['right'].set_color('red')
ax2.spines['right'].set_linewidth(4)

for label in ax2.get_yticklabels():
    label.set_color("red")
    
ax1.set_title("Inflation Rate");
plt.yticks(fontsize=15)

plt.show()

---

### **Exploratory Data Analysis with Seaborn**
Credit score cards are a common risk control method in the financial industry. It uses personal information and data submitted by credit card applicants to predict the probability of future defaults and credit card borrowings. The bank is able to decide whether to issue a credit card to the applicant. Credit scores can objectively quantify the magnitude of risk.

**For this exercise, we will use the credit cards data provided by [Kaggle](https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction)**

---

**Task: Run the cell below to load the dataset**

The cell below will import the dataset for the challenge.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("https://raw.githubusercontent.com/ryoshi007/Datasets/main/Credit_Card_Data.csv")
df.head()

---
Feature Information:

<table>
<thead>
<tr>
<th>application_record.csv</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Feature name</td>
<td>Explanation</td>
<td>Remarks</td>
</tr>
<tr>
<td><code>ID</code></td>
<td>Client number</td>
<td></td>
</tr>
<tr>
<td><code>CODE_GENDER</code></td>
<td>Gender</td>
<td></td>
</tr>
<tr>
<td><code>FLAG_OWN_CAR</code></td>
<td>Is there a car</td>
<td></td>
</tr>
<tr>
<td><code>FLAG_OWN_REALTY</code></td>
<td>Is there a property</td>
<td></td>
</tr>
<tr>
<td><code>CNT_CHILDREN</code></td>
<td>Number of children</td>
<td></td>
</tr>
<tr>
<td><code>AMT_INCOME_TOTAL</code></td>
<td>Annual income</td>
<td></td>
</tr>
<tr>
<td><code>NAME_INCOME_TYPE</code></td>
<td>Income category</td>
<td></td>
</tr>
<tr>
<td><code>NAME_EDUCATION_TYPE</code></td>
<td>Education level</td>
<td></td>
</tr>
<tr>
<td><code>NAME_FAMILY_STATUS</code></td>
<td>Marital status</td>
<td></td>
</tr>
<tr>
<td><code>NAME_HOUSING_TYPE</code></td>
<td>Way of living</td>
<td></td>
</tr>
<tr>
<td><code>DAYS_BIRTH</code></td>
<td>Birthday</td>
<td>Count backwards from current day (0), -1 means yesterday</td>
</tr>
<tr>
<td><code>DAYS_EMPLOYED</code></td>
<td>Start date  of employment</td>
<td>Count backwards from current day(0). If  positive, it means the person currently unemployed.</td>
</tr>
<tr>
<td><code>FLAG_MOBIL</code></td>
<td>Is there a mobile   phone</td>
<td></td>
</tr>
<tr>
<td><code>FLAG_WORK_PHONE</code></td>
<td>Is there a work phone</td>
<td></td>
</tr>
<tr>
<td><code>FLAG_PHONE</code></td>
<td>Is there a phone</td>
<td></td>
</tr>
<tr>
<td><code>FLAG_EMAIL</code></td>
<td>Is there an email</td>
<td></td>
</tr>
<tr>
<td><code>OCCUPATION_TYPE</code></td>
<td>Occupation</td>
<td></td>
</tr>
<tr>
<td><code>CNT_FAM_MEMBERS</code></td>
<td>Family size</td>
<td></td>
</tr>
</tbody>
</table>

---
**TASK: Create the scatter plot shown below (4 points)**  

The scatterplot attempts to show the relationship between the days employed versus the age of the person (DAY_BIRTH) for people who were not unemployed. Note, to reproduce this chart, you must remove umemployed people from the dataset first. Also note the sign of the axis, they are both transformed to be positive.

Tips:
1. Remove the umemployed people
2. Make both DAY_EMPLOYED and DAYS_BIRTH value positive
3. Graph configuration:  
    - Figure size: (12,8)
    - Line width: 0
    - Alpha: 0.01

    <br>

![EDA_Graph_1.png](Assets/EDA_Graph_1.png)

In [None]:
# RUN ME FIRST

import warnings
warnings.simplefilter('ignore')

In [None]:
# CODE HERE

In [None]:
# SOLUTION

plt.figure(figsize=(12,8))

employed = df[df['DAYS_EMPLOYED']<0]

employed['DAYS_EMPLOYED'] = -1*employed['DAYS_EMPLOYED']
employed['DAYS_BIRTH'] = -1*employed['DAYS_BIRTH']
sns.scatterplot(y='DAYS_EMPLOYED',x='DAYS_BIRTH',data=employed,
                alpha=0.01,linewidth=0)
plt.show()

---
**TASK: Create the distribution plot shown below (2 points)**  

Figure out on how to calculate "Age in Years" from one of the columns in the dataframe. And then proceed to create the distribution plot using seaborn.

Tips:
1. Graph configuration:  
    - Figure size: (8,4)
    - Line width: 2
    - Edge color: Black
    - Bar color: Red
    - Bins value: 45
    - Alpha: 0.4

    <br>

![EDA_Graph_2.png](Assets/EDA_Graph_2.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

plt.figure(figsize=(8,4))

df['YEARS'] = -1*df['DAYS_BIRTH']/365
sns.histplot(data=df,x='YEARS',linewidth=2,edgecolor='black',
             color='red',bins=45,alpha=0.4)
plt.xlabel("Age in Years")
plt.show()

---
**TASK: Create the categorical plot shown below (3 points)**  

This plot shows information only for the bottom half of income earners in the dataset. It shows the boxplots for each category of NAME_FAMILY_STATUS column for displaying their distribution of their total income. The hue is the "FLAG_OWN_REALTY" column.

Tips:
1. Graph configuration:  
    - Figure size: (12,5)

    <br>

2. Legend configuration:  
    - Bbox to anchor: (1.05, 1)
    - Loc: 2
    - Border axes pad: 0

![EDA_Graph_3.png](Assets/EDA_Graph_3.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

plt.figure(figsize=(12,5))

bottom_half_income = df.nsmallest(n=int(0.5*len(df)),columns='AMT_INCOME_TOTAL')
sns.boxplot(x='NAME_FAMILY_STATUS',y='AMT_INCOME_TOTAL',data=bottom_half_income,hue='FLAG_OWN_REALTY')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,title='FLAG_OWN_REALTY')
plt.title('Income Totals per Family Status for Bottom Half of Earners')
plt.show()

---
**TASK: Create the heat map shown below (3 points)**  

This heatmap shows the correlation between the columns in the dataframe. You can get correlation with .corr(). Also, note that FLAG_MOBIL column has NaN correlation with every other column, so you should drop it before calling .corr()

Tips:
1. Drop the column with NaN value
2. Search about .corr() in pandas
3. Graph configuration:  
    - cmap: Viridis

![EDA_Graph_4.png](Assets/EDA_Graph_4.png)

In [None]:
# CODE HERE

In [None]:
# SOLUTION

sns.heatmap(df.drop('FLAG_MOBIL',axis=1).corr(),cmap="viridis")

---
---

## **Congratulations 🏅**
You have managed to complete all of the hard challenges. Truly legend! 😮   

---
---

# **Conclusion 👋**
Congratulation for reaching the end section of "Data Visualization" station. We hope that these tutorials and challenges could increase your knowledges about the matplotlib and seaborn libraries. What we have included here is just a tip of iceberg and it would be great if you can self-explore the rest of the libraries. If you fail to complete some challenges, please don't give up! This is just a part of learning process, nobody can ace it at once. Forget the mistake, remember the lesson. Keep your spirit up and be ready to explore other fun station. Goodbye!

---