<a href="https://colab.research.google.com/github/presleyyyy/Presley-data-science/blob/main/data_visualization_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this checkpoint, we are going to work on the 'Climate change in Africa' dataset that was provided by the U.S global change research program.

Dataset description : This dataset contains historical data about the daily min, max and average temperature fluctuation in 5 African countries (Egypt, Tunisia, Cameroon, Senegal, Angola) between 1980 and 2023.

➡️ Dataset link

https://i.imgur.com/w2czdso.jpg


Instructions

1. Load the dataset into a data frame using Python.
2. Clean the data as needed.
3. Plot a line chart to show the average temperature fluctuations in Tunisia and Cameroon. Interpret the results.
4. Zoom in to only include data between 1980 and 2005, try to customize the axes labels.
5. Create Histograms to show temperature distribution in Senegal between [1980,2000] and [2000,2023] (in the same figure). Describe the obtained results.
6. Select the best chart to show the Average temperature per country.
Make your own questions about the dataset and try to answer them using the appropriate visuals.

In [None]:
#install plotly
!pip install plotly



In [None]:
import plotly.express as px
import pandas as pd

In [None]:
#read in the dataset
df = pd.read_csv('/content/Africa_climate_change.csv')

# 2. # **cleaning the data**

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 464815 entries, 0 to 464814
Data columns (total 6 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   DATE     464815 non-null  object 
 1   PRCP     177575 non-null  float64
 2   TAVG     458439 non-null  float64
 3   TMAX     363901 non-null  float64
 4   TMIN     332757 non-null  float64
 5   COUNTRY  464815 non-null  object 
dtypes: float64(4), object(2)
memory usage: 21.3+ MB


In [None]:
df.head()

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
0,19800101 000000,,54.0,61.0,43.0,Tunisia
1,19800101 000000,,49.0,55.0,41.0,Tunisia
2,19800101 000000,0.0,72.0,86.0,59.0,Cameroon
3,19800101 000000,,50.0,55.0,43.0,Tunisia
4,19800101 000000,,75.0,91.0,,Cameroon


In [None]:
df.describe()

Unnamed: 0,PRCP,TAVG,TMAX,TMIN
count,177575.0,458439.0,363901.0,332757.0
mean,0.120941,77.029838,88.713969,65.548262
std,0.486208,11.523634,13.042631,11.536547
min,0.0,-49.0,41.0,12.0
25%,0.0,70.0,81.0,58.0
50%,0.0,80.0,90.0,68.0
75%,0.01,85.0,99.0,74.0
max,19.69,110.0,123.0,97.0


In [None]:
df = df.dropna()

In [None]:
df.isnull().sum()

DATE       0
PRCP       0
TAVG       0
TMAX       0
TMIN       0
COUNTRY    0
dtype: int64

# Ploting a line chart to show the average temperature fluctuations in Tunisia and Cameroon.

In [None]:
#to adjust the date of the from 1980-2005
df1 = df[df["DATE"]<'2006']

In [None]:
#to adjust the line chart to show dataset for tunsia and cameroon
df2 = df1[(df1["COUNTRY"] =='Tunsia') | (df1["COUNTRY"] == 'Cameroon')]

In [None]:
#code to draw a line chart of tunsia and cameroon
fig= px.line(df2, x="DATE", y="TAVG", title= 'Average temperature fluctuation in Tunsia and Cameroon')

In [None]:
#show the line chart
fig.show()

In [None]:
# adjust the date from 1980-2000
df3 = df[df["DATE"]<'2001']
df3

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
2,19800101 000000,0.00,72.0,86.0,59.0,Cameroon
6,19800101 000000,0.00,76.0,97.0,59.0,Senegal
7,19800101 000000,0.00,74.0,95.0,59.0,Senegal
8,19800101 000000,0.00,78.0,93.0,63.0,Senegal
9,19800101 000000,0.00,76.0,91.0,59.0,Senegal
...,...,...,...,...,...,...
207978,20001228 000000,0.03,54.0,61.0,49.0,Tunisia
207980,20001228 000000,0.03,53.0,57.0,46.0,Tunisia
208013,20001230 000000,0.35,52.0,62.0,45.0,Tunisia
208057,20001231 000000,0.08,49.0,53.0,43.0,Tunisia


In [None]:
#retrive data for only senegal
df4 = df3[(df3["COUNTRY"]== 'Senegal')]
df4

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
6,19800101 000000,0.00,76.0,97.0,59.0,Senegal
7,19800101 000000,0.00,74.0,95.0,59.0,Senegal
8,19800101 000000,0.00,78.0,93.0,63.0,Senegal
9,19800101 000000,0.00,76.0,91.0,59.0,Senegal
13,19800101 000000,0.00,74.0,81.0,66.0,Senegal
...,...,...,...,...,...,...
206287,20001026 000000,0.24,81.0,89.0,74.0,Senegal
206288,20001026 000000,0.00,86.0,96.0,70.0,Senegal
206295,20001026 000000,0.00,85.0,96.0,73.0,Senegal
206301,20001027 000000,0.00,83.0,92.0,75.0,Senegal


In [None]:
#adjust the date from 1999-2023
df5 = df4[df4["DATE"]>'1999']
df5

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
195396,19990917 000000,1.10,80.0,89.0,72.0,Senegal
195404,19990917 000000,1.54,77.0,82.0,72.0,Senegal
195408,19990917 000000,0.00,88.0,98.0,77.0,Senegal
195410,19990917 000000,0.43,80.0,91.0,71.0,Senegal
195430,19990918 000000,0.00,80.0,87.0,75.0,Senegal
...,...,...,...,...,...,...
206287,20001026 000000,0.24,81.0,89.0,74.0,Senegal
206288,20001026 000000,0.00,86.0,96.0,70.0,Senegal
206295,20001026 000000,0.00,85.0,96.0,73.0,Senegal
206301,20001027 000000,0.00,83.0,92.0,75.0,Senegal


In [None]:
#code to show the histogram f0r (2000-2023) and (1980-2000)
fig = px.histogram(df4, x="DATE",y="TMAX")
fig = px.histogram(df5, x="DATE",y="TMAX")


In [None]:
fig.show()

In [None]:
#adjust the year from 2000-2023
df6 = df[df["DATE"]>'2000']
df6

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
198342,20000101 000000,0.00,47.0,56.0,34.0,Tunisia
198343,20000101 000000,0.00,75.0,85.0,66.0,Senegal
198344,20000101 000000,0.00,69.0,84.0,53.0,Egypt
198345,20000101 000000,0.00,73.0,88.0,63.0,Senegal
198346,20000101 000000,0.00,75.0,88.0,64.0,Senegal
...,...,...,...,...,...,...
464778,20230822 000000,0.00,85.0,93.0,81.0,Senegal
464786,20230823 000000,0.00,87.0,101.0,71.0,Tunisia
464799,20230823 000000,0.00,90.0,102.0,80.0,Tunisia
464803,20230823 000000,1.22,83.0,90.0,76.0,Senegal


In [None]:
# code to draw the bar chart
fig1 = px.bar(df6, x="COUNTRY", y="TAVG", title="Average temperature per Country 2000-2023", color="COUNTRY")

In [None]:
#show the result of the bar chart
fig1.show()

# 1. Country with the most Average temperature from 2000-2023 ?
# ans. the bar chart show that Tunsia has the most average temperature
#2. Country with the least average temperature from 2000-2023?
#ans. the bar chart show Angola has the least average temperature
#3. Country with median Average temperature from 2000-2023 ?
# ans. the bar chart show that Egypt has the median average temperature