I chose this data because it closely reflected the data I needed to show a holistic reflection on climate change and directly how the mean temperature has changed throughout the years. I chose the data provided by the IMF because they have global coverage, credibility, and consistency. 

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
import plotly.express as px
from dash import Dash, dcc, html, Input, Output

# Load the dataset
file_path = '/Users/fionamagee/Desktop/Sprint2/dataset1.csv'
data = pd.read_csv(file_path)

# Drop Unnecessary Columns
columns_to_drop = ['ObjectId', 'ISO2', 'ISO3', 'Indicator', 'Unit', 'Source', 'CTS_Code', 'CTS_Name', 'CTS_Full_Descriptor']
data_cleaned = data.drop(columns=columns_to_drop)

# Rename Columns
data_cleaned.rename(columns={'Country': 'country'}, inplace=True)

# Deal with Missing Values
# Fill missing values with the mean of each column, but only for numeric columns
numeric_cols = data_cleaned.select_dtypes(include=[np.number]).columns
data_cleaned[numeric_cols] = data_cleaned[numeric_cols].fillna(data_cleaned[numeric_cols].mean())

# Tidy the Data
# Convert the dataset from wide format to long format
data_tidy = pd.melt(data_cleaned, id_vars=['country'], var_name='year', value_name='temperature_change')

#removed F from year 
data_tidy['year'] = data_tidy['year'].str.replace('F', '')

# Display the tidied dataset
print(data_tidy.head())




                        country  year  temperature_change
0  Afghanistan, Islamic Rep. of  1961              -0.113
1                       Albania  1961               0.627
2                       Algeria  1961               0.164
3                American Samoa  1961               0.079
4      Andorra, Principality of  1961               0.736


Cleaned Data Analysis
1. Some observation I made while looking at this data was it was very long and had lots of unnecessary parts were included within it. 
2. The categories of this data included,  ObjectId, ISO2, ISO3, Indicator, Unit,  Source, CTS_Code, CTS_Name, CTS_Full_Descriptor, country, year, and temperature change. 
3. There were some missing values in the mean for each column which I cleaned up. 
4. The distributions of continuous variables vary throughout each country and year, but some examples include in 1961 there were no outliers but in 2010 the outliers included 2.265°C, 2.775°C, 2.327°C. 

In [3]:
dataDict = {
    #country row 
    'country': {
        'description': 'Name of the country',
        'type': 'string'
    },
    #year row 
    'year': {
        'description': 'Year of the temperature change record',
        'type': 'string'
    },
    #temperature row 
    'temperature_change': {
        'description': 'Temperature change in degrees Celsius',
        'type': 'float'
    }
}

# Converted the data dictionary to a DataFrame so it can be easily viewed
finalDatadict = pd.DataFrame.from_dict(dataDict, orient='index')
finalDatadict.index.name = 'column'
finalDatadict.reset_index(inplace=True)

# Display the data dictionary
print(finalDatadict)

               column                            description    type
0             country                    Name of the country  string
1                year  Year of the temperature change record  string
2  temperature_change  Temperature change in degrees Celsius   float


UI Components 
- I’d include a dropdown menu to select the country 
- I’d include infographics and fact boxes to provide additional information about climate change progress that cannot be shown through data visualization 
- I’d include a CSS styling sheet to make the website come together and look appealing to the user 


Data Visualization Components 
- I’d included a drop down bar to select multiple countries 
- I’d include a bar graph ideally so the change in temperature can be most well reflected 
- I could also include a slider that changes the years that are desired to be visualized 
