# PyCity Schools Analysis

* As a whole, schools with higher budgets, did not yield better test results. By contrast, schools with higher spending 645-675 per student actually underperformed compared to schools with smaller budgets (585 per student).

* As a whole, smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

* As a whole, charter schools out-performed the public district schools across all metrics. However, more analysis will be required to glean if the effect is due to school practices or the fact that charter schools tend to serve smaller student populations per school. 
---

In [5]:
!conda activate PythonData
# Dependencies and Setup
import pandas as pd
from pathlib import Path

# File to Load (Remember to Change These)
climate_change_data = Path("Resources/climate_change_data.csv")


# Read School and Student Data File and store into Pandas DataFrames
climate_change_data_pd = pd.read_csv(climate_change_data)

climate_change_data_pd.columns




Index(['Date', 'Location', 'Country', 'Temperature', 'CO2 Emissions',
       'Sea Level Rise', 'Precipitation', 'Humidity', 'Wind Speed'],
      dtype='object')

In [12]:
cleaned_Climate_data_df = climate_change_data_pd.rename(columns={"CO2 Emissions":"CO2_Emissions", "Sea Level Rise":"Sea_Level_Rise","Wind Speed":"Wind_Speed"})
cleaned_Climate_data_df.columns

Index(['Date', 'Location', 'Country', 'Temperature', 'CO2_Emissions',
       'Sea_Level_Rise', 'Precipitation', 'Humidity', 'Wind_Speed'],
      dtype='object')

In [16]:
# cleaned_Climate_data_df['Date'] = pd.to_datetime(cleaned_Climate_data_df['Date'])
cleaned_Climate_data_df

Unnamed: 0,Date,Location,Country,Temperature,CO2_Emissions,Sea_Level_Rise,Precipitation,Humidity,Wind_Speed
0,2000-01-01 00:00:00.000000000,New Williamtown,Latvia,10.688986,403.118903,0.717506,13.835237,23.631256,18.492026
1,2000-01-01 20:09:43.258325832,North Rachel,South Africa,13.814430,396.663499,1.205715,40.974084,43.982946,34.249300
2,2000-01-02 16:19:26.516651665,West Williamland,French Guiana,27.323718,451.553155,-0.160783,42.697931,96.652600,34.124261
3,2000-01-03 12:29:09.774977497,South David,Vietnam,12.309581,422.404983,-0.475931,5.193341,47.467938,8.554563
4,2000-01-04 08:38:53.033303330,New Scottburgh,Moldova,13.210885,410.472999,1.135757,78.695280,61.789672,8.001164
...,...,...,...,...,...,...,...,...,...
9995,2022-12-27 15:21:06.966696576,South Elaineberg,Bhutan,15.020523,391.379537,-1.452243,93.417109,25.293814,6.531866
9996,2022-12-28 11:30:50.225022464,Leblancville,Congo,16.772451,346.921190,0.543616,49.882947,96.787402,42.249014
9997,2022-12-29 07:40:33.483348224,West Stephanie,Argentina,22.370025,466.042136,1.026704,30.659841,15.211825,18.293708
9998,2022-12-30 03:50:16.741674112,Port Steven,Albania,19.430853,337.899776,-0.895329,18.932275,82.774520,42.424255


In [14]:
cleaned_Climate_data_df.to_csv("Resources/cleaned_Climate_data.csv", encoding='utf8', index=False)

In [3]:
# File to Load (Remember to Change These)
Nat_disaster_data = Path("Resources/EMDAT_1900-2021_NatDis_WIP.xlsx")


# Read School and Student Data File and store into Pandas DataFrames
Nat_disaster_data_pd = pd.read_excel(Nat_disaster_data)

Nat_disaster_data_pd.columns


Index(['Dis No', 'Year', 'Seq', 'Disaster Group', 'Disaster Subgroup',
       'Disaster Type', 'Disaster Subtype', 'Disaster Subsubtype',
       'Event Name', 'Entry Criteria', 'Country', 'ISO', 'Region', 'Continent',
       'Location', 'Origin', 'Associated Dis', 'Associated Dis2',
       'OFDA Response', 'Appeal', 'Declaration', 'Aid Contribution',
       'Dis Mag Value', 'Dis Mag Scale', 'Latitude', 'Longitude', 'Local Time',
       'River Basin', 'Start Year', 'Start Month', 'Start Day', 'End Year',
       'End Month', 'End Day', 'Total Deaths', 'No Injured', 'No Affected',
       'No Homeless', 'Total Affected', 'Reconstruction Costs ('000 US$)',
       'Insured Damages ('000 US$)', 'Total Damages ('000 US$)', 'CPI'],
      dtype='object')

In [4]:
col_todrop = ['Disaster Subsubtype','Disaster Subgroup', 'Disaster Subtype', 'Disaster Subsubtype',  'No Injured', 'No Affected', \
              'No Homeless', 'Total Affected']
cleaned_Nat_disaster_data_pd = Nat_disaster_data_pd.drop(labels=col_todrop, axis=1)
cleaned_Nat_disaster_data_pd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15827 entries, 0 to 15826
Data columns (total 36 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Dis No                           15827 non-null  object 
 1   Year                             15827 non-null  int64  
 2   Seq                              15827 non-null  int64  
 3   Disaster Group                   15827 non-null  object 
 4   Disaster Type                    15827 non-null  object 
 5   Event Name                       3803 non-null   object 
 6   Entry Criteria                   15492 non-null  object 
 7   Country                          15827 non-null  object 
 8   ISO                              15827 non-null  object 
 9   Region                           15827 non-null  object 
 10  Continent                        15827 non-null  object 
 11  Location                         14019 non-null  object 
 12  Origin            

## Clean up