<a href="https://colab.research.google.com/github/hug627/streamlit-for-30-days/blob/main/Data_visualizations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
import pandas as pd
import plotly.express as px

In [10]:
#Load the dataset into a data frame using Python.
file_id = '1I8eV4-8p61CNNlVJzzho2xeoZ5-P7Q0F'
url = f'https://drive.google.com/uc?export=download&id={file_id}'
africa_climate_change = pd.read_csv(url)
africa_climate_change.head()

Unnamed: 0,DATE,PRCP,TAVG,TMAX,TMIN,COUNTRY
0,19800101 000000,,54.0,61.0,43.0,Tunisia
1,19800101 000000,,49.0,55.0,41.0,Tunisia
2,19800101 000000,0.0,72.0,86.0,59.0,Cameroon
3,19800101 000000,,50.0,55.0,43.0,Tunisia
4,19800101 000000,,75.0,91.0,,Cameroon


In [15]:
#Clean the data as needed.
# Step 1: Inspect the dataset
print(africa_climate_change.shape)  # rows, columns
print(africa_climate_change.info()) # column types
print(africa_climate_change.isnull().sum())  # missing values
print(africa_climate_change.head())
# Step 2: Remove duplicates
africa_climate_change.drop_duplicates(inplace=True)

# Step 3: Handle missing values (example strategies)
# Fill numeric columns with median
numeric_cols = africa_climate_change.select_dtypes(include=['float64', 'int64']).columns
africa_climate_change[numeric_cols] = africa_climate_change[numeric_cols].fillna(
    africa_climate_change[numeric_cols].median()
)
# Fill categorical columns with mode
categorical_cols = africa_climate_change.select_dtypes(include=['object']).columns
for col in categorical_cols:
    africa_climate_change[col].fillna(africa_climate_change[col].mode()[0], inplace=True)
# Step 4: Fix column names (remove spaces, lowercase)
africa_climate_change.columns = africa_climate_change.columns.str.strip().str.lower().str.replace(' ', '_')

# Step 5: Convert data types if needed (example: year to int)
if 'year' in africa_climate_change.columns:
    africa_climate_change['year'] = pd.to_numeric(africa_climate_change['year'], errors='coerce').astype('Int64')
# Step 6: Save cleaned dataset (optional)
africa_climate_change.to_csv('africa_climate_change_cleaned.csv', index=False)
# Final check
print(africa_climate_change.head())


(461470, 6)
<class 'pandas.core.frame.DataFrame'>
Index: 461470 entries, 0 to 464814
Data columns (total 6 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   date     461470 non-null  object 
 1   prcp     461470 non-null  float64
 2   tavg     461470 non-null  float64
 3   tmax     461470 non-null  float64
 4   tmin     461470 non-null  float64
 5   country  461470 non-null  object 
dtypes: float64(4), object(2)
memory usage: 24.6+ MB
None
date       0
prcp       0
tavg       0
tmax       0
tmin       0
country    0
dtype: int64
              date  prcp  tavg  tmax  tmin   country
0  19800101 000000   0.0  54.0  61.0  43.0   Tunisia
1  19800101 000000   0.0  49.0  55.0  41.0   Tunisia
2  19800101 000000   0.0  72.0  86.0  59.0  Cameroon
3  19800101 000000   0.0  50.0  55.0  43.0   Tunisia
4  19800101 000000   0.0  75.0  91.0  68.0  Cameroon


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  africa_climate_change[col].fillna(africa_climate_change[col].mode()[0], inplace=True)


              date  prcp  tavg  tmax  tmin   country
0  19800101 000000   0.0  54.0  61.0  43.0   Tunisia
1  19800101 000000   0.0  49.0  55.0  41.0   Tunisia
2  19800101 000000   0.0  72.0  86.0  59.0  Cameroon
3  19800101 000000   0.0  50.0  55.0  43.0   Tunisia
4  19800101 000000   0.0  75.0  91.0  68.0  Cameroon


In [33]:
#Plot a line chart to show the average temperature fluctuations in Tunisia and Cameroon. Interpret the results.
# Normalize country names
africa_climate_change['country'] = africa_climate_change['country'].str.strip().str.title()
# Filter for Tunisia and Cameroon
subset = africa_climate_change[africa_climate_change['country'].isin(['Tunisia', 'Cameroon'])]
# Convert date to datetime
subset['date'] = pd.to_datetime(subset['date'], errors='coerce')
# Extract year
subset['year'] = subset['date'].dt.year
# Group by year and country
avg_temp_fluctuations = subset.groupby(['year', 'country'])['tavg'].mean().reset_index()
# Plot line chart
fig = px.line(
    avg_temp_fluctuations,
    x='year',
    y='tavg',
    color='country',
    title='Average Temperature Fluctuations in Tunisia and Cameroon'
)
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [34]:
#Zoom in to only include data between 1980 and 2005, try to customize the axes labels.
# Filter years between 1980 and 2005
subset = subset[(subset['year'] >= 1980) & (subset['year'] <= 2005)]
# Group by year and country
avg_temp_fluctuations = subset.groupby(['year', 'country'])['tavg'].mean().reset_index()
# Plot with custom labels
fig = px.line(
    avg_temp_fluctuations,
    x='year',
    y='tavg',
    color='country',
    title='Average Temperature Fluctuations (1980–2005) in Tunisia and Cameroon',
    labels={
        'year': 'Year',
        'tavg': 'Average Temperature (°C)',
        'country': 'Country'
    }
)
fig.show()

In [38]:
#Create Histograms to show temperature distribution in Senegal between [1980,2000] and [2000,2023] (in the same figure). Describe the obtained results.
# Ensure country names are standardized
africa_climate_change['country'] = africa_climate_change['country'].str.strip().str.title()
# Filter for Senegal only
senegal_data = africa_climate_change[africa_climate_change['country'] == 'Senegal']
# Convert date to datetime and extract year
senegal_data['date'] = pd.to_datetime(senegal_data['date'], errors='coerce')
senegal_data['year'] = senegal_data['date'].dt.year
# Keep only valid years
senegal_data = senegal_data.dropna(subset=['year'])
# Add a new column for the period
def get_period(year):
    if 1980 <= year <= 2000:
        return '1980–2000'
    elif 2000 < year <= 2023:
        return '2000–2023'
    else:
        return None
senegal_data['period'] = senegal_data['year'].apply(get_period)
senegal_data = senegal_data.dropna(subset=['period'])
# Plot histograms for both periods in the same figure
fig = px.histogram(
    senegal_data,
    x='tavg',
    color='period',
    barmode='overlay',  # overlay for comparison, can change to 'group'
    nbins=30,
    title='Temperature Distribution in Senegal (1980–2000 vs 2000–2023)',
    labels={
        'tavg': 'Average Temperature (°C)',
        'period': 'Time Period'
    },
    opacity=0.6
)

fig.show()




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [37]:
#Select the best chart to show the Average temperature per country.
# Calculate average temperature per country
avg_temp_per_country = africa_climate_change.groupby('country')['tavg'].mean().reset_index()

# Sort by temperature
avg_temp_per_country = avg_temp_per_country.sort_values(by='tavg', ascending=False)

# Plot bar chart
fig = px.bar(
    avg_temp_per_country,
    x='country',
    y='tavg',
    title='Average Temperature per Country',
    labels={'tavg': 'Average Temperature (°C)', 'country': 'Country'},
    color='tavg',
    color_continuous_scale='Viridis' # Changed colorscale to a valid one
)
fig.update_layout(xaxis={'categoryorder': 'total descending'})
fig.show()