This notebook covers the creation of CSVs that will be used to populate the meteorite database.
The following tables will be created in the database:

|table name|columns|data types|csv data file|
|------|------|------|------|
|meteorite_main|id|integer|meteorite_main.csv|
||name |VARCHAR(50)||
||recclass |VARCHAR(50)||
||mass_grams |float||
||fall |VARCHAR(10)||
||year |integer||
||reclat |float||
||reclong |float||
||geolocation |varchar(50)||
||geometry |varchar(50)||
||elevation |float||
||country |varchar(50)||
||state_abbrev |varchar(10)||
|state|state_abbrev |varchar(10)|state.csv|
||state |VARCHAR(50)||
||FIPS|integer||
||area_sqkm |integer||
||country |varchar(50)||
|landcover|id |Serial|landcover.csv|
||state_abbrev varchar(10)||
||variable varchar(50)||
||value float||
|meteorite_type|recclass |VARCHAR(50)|meteorite_type.csv|
||meteorite_class_subclass |VARCHAR(50)||
||meteorite_class |VARCHAR(50)||
||meteorite_type |VARCHAR(50)||





In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import geopy
from geopy.geocoders import Nominatim
from pathlib import Path
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


* Pull in clean datasets for meteorites, land cover, meteorite types, state FIPS and area
* create tables that mimic db tables
* save as csv

## meteorite_type table

db table:

CREATE TABLE meteorite_type (
	recclass VARCHAR(50),
  	meteorite_class_subclass VARCHAR(50),
	meteorite_class VARCHAR(50),
  	meteorite_type VARCHAR(50),
  	PRIMARY KEY (recclass)
);

In [3]:
# pull in meteorite_types csv sourced from (ADD WEBSITES HERE)
meteorite_type = pd.read_csv('Resources/Raw_Data/meteorite_types.csv')
print(meteorite_type.dtypes)
meteorite_type.head()

recclass                    object
meteorite_class_subclass    object
meteorite_class             object
meteorite_type              object
dtype: object


Unnamed: 0,recclass,meteorite_class_subclass,meteorite_class,meteorite_type
0,H5,Chrondrite - ordinary,Chrondrite,Chrondrite
1,L6,Chrondrite - ordinary,Chrondrite,Chrondrite
2,L5,Chrondrite - ordinary,Chrondrite,Chrondrite
3,"Iron, ungrouped",Iron - other,Iron,Iron
4,"Iron, IVA",Iron - magmatic,Iron,Iron


* meteorite_type table schema and csv are compatible
* Use meteorite_types.csv to populate meteorite_type db table.


In [4]:
# # write meteorite_type to csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
# meteorite_type.to_csv('Resources/DB_CSV/meteorite_types.csv', index = False)

In [5]:
# # check meteorite_type.csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
# meteorite_type_check = pd.read_csv('Resources/DB_CSV/meteorite_types.csv')
# print(len(meteorite_type_check))
# print(meteorite_type_check.dtypes)
# meteorite_type_check.head()

## state table

db table:

CREATE TABLE state (
	state_abbrev varchar(10),
	state VARCHAR(50),
  	area_sqkm integer,
	FIPS integer,
	country varchar(50),
  	PRIMARY KEY (state_abbrev)
);


In [6]:
# pull in state FIPS csv sourced from (ADD WEBSITES HERE)
state_fips = pd.read_csv('Resources/Raw_Data/state_fips.csv')
print(len(state_fips))
print(state_fips.dtypes)
state_fips.head()

53
state           object
state_abbrev    object
FIPS             int64
dtype: object


Unnamed: 0,state,state_abbrev,FIPS
0,Alabama,AL,1
1,Alaska,AK,2
2,Arizona,AZ,4
3,Arkansas,AR,5
4,California,CA,6


In [7]:
# pull in state sqkm csv sourced from (ADD WEBSITES HERE)
state_sqkm = pd.read_csv('Resources/Raw_Data/state_sqkm.csv')
print(len(state_sqkm))
print(state_sqkm.dtypes)
state_sqkm.head()


57
state        object
area_sqkm     int64
dtype: object


Unnamed: 0,state,area_sqkm
0,Alabama,135767
1,Alaska,1723337
2,Arizona,295234
3,Arkansas,137732
4,California,423967


In [8]:
# merge state_fips and state_sqkm to create state df
state2 = pd.merge(state_fips,state_sqkm)
state = state2[['state_abbrev','state','FIPS','area_sqkm']]
state['country'] = 'United States'
print(len(state))
print(state.dtypes)
state.head()

52
state_abbrev    object
state           object
FIPS             int64
area_sqkm        int64
country         object
dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,state_abbrev,state,FIPS,area_sqkm,country
0,AL,Alabama,1,135767,United States
1,AK,Alaska,2,1723337,United States
2,AZ,Arizona,4,295234,United States
3,AR,Arkansas,5,137732,United States
4,CA,California,6,423967,United States


In [9]:
# # write state to csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
# state.to_csv('Resources/DB_CSV/state.csv', index = False)

In [10]:
# # check state.csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
# state_check = pd.read_csv('Resources/DB_CSV/state.csv')
# print(len(state_check))
# print(state_check.dtypes)
# state_check.head()


* State table schema and csv are compatible
* Use state.csv to populate state db table.

## state_FIPS_char table  - contains FIPS column as character

db table:

CREATE TABLE state (
	state_abbrev varchar(10),
	state VARCHAR(50),
  	area_sqkm integer,
    FIPS varchar(10),
	country varchar(50),
  	PRIMARY KEY (state_abbrev)
);

In [11]:
# pull in state_fips_char FIPS csv sourced from (ADD WEBSITES HERE)
state_fips_char = pd.read_csv('Resources/Raw_Data/state_fips_char.csv')
print(len(state_fips_char))
print(state_fips_char.dtypes)
state_fips_char.head()

53
state           object
state_abbrev    object
FIPS             int64
FIPS_char        int64
dtype: object


Unnamed: 0,state,state_abbrev,FIPS,FIPS_char
0,Alabama,AL,1,1
1,Alaska,AK,2,2
2,Arizona,AZ,4,4
3,Arkansas,AR,5,5
4,California,CA,6,6


In [12]:
# pull in state sqkm csv sourced from (ADD WEBSITES HERE)
state_sqkm = pd.read_csv('Resources/Raw_Data/state_sqkm.csv')
print(len(state_sqkm))
print(state_sqkm.dtypes)
state_sqkm.head()

57
state        object
area_sqkm     int64
dtype: object


Unnamed: 0,state,area_sqkm
0,Alabama,135767
1,Alaska,1723337
2,Arizona,295234
3,Arkansas,137732
4,California,423967


In [13]:
# merge state_fips_char and state_sqkm to create state df
state_fips_char_2 = pd.merge(state_fips_char,state_sqkm)
state_fips_char_3 = state_fips_char_2[['state_abbrev','state','FIPS','area_sqkm']]
state_fips_char_3['country'] = 'United States'
state_fips_char_3['FIPS']= state_fips_char_3['FIPS'].astype(str)
state_fips_char_3.replace({'1':'01','2':'02','4':'04','5':'05','6':'06','8':'08','9':'09'}, inplace = True)
print(len(state_fips_char_3))
print(state_fips_char_3.dtypes)
state_fips_char_3.head(10)

52
state_abbrev    object
state           object
FIPS            object
area_sqkm        int64
country         object
dtype: object


Unnamed: 0,state_abbrev,state,FIPS,area_sqkm,country
0,AL,Alabama,1,135767,United States
1,AK,Alaska,2,1723337,United States
2,AZ,Arizona,4,295234,United States
3,AR,Arkansas,5,137732,United States
4,CA,California,6,423967,United States
5,CO,Colorado,8,269601,United States
6,CT,Connecticut,9,14357,United States
7,DE,Delaware,10,6446,United States
8,DC,District of Columbia,11,177,United States
9,FL,Florida,12,170312,United States


In [14]:
# write state_fips_char_3 to csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
state_fips_char_3.to_csv('Resources/DB_CSV/state_FIPS_char.csv', index = False)

In [15]:
# check state_fips_char_check.csv- csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
state_fips_char_check = pd.read_csv('Resources/DB_CSV/state_FIPS_char.csv')
print(len(state_fips_char_check))
print(state_fips_char_check.dtypes)
state_fips_char_check.head()

52
state_abbrev    object
state           object
FIPS             int64
area_sqkm        int64
country         object
dtype: object


Unnamed: 0,state_abbrev,state,FIPS,area_sqkm,country
0,AL,Alabama,1,135767,United States
1,AK,Alaska,2,1723337,United States
2,AZ,Arizona,4,295234,United States
3,AR,Arkansas,5,137732,United States
4,CA,California,6,423967,United States


State_fips_char
* state_fips_char table schema and csv are compatible
* csv file contains FIPS with leading zeroes

## landcover table

db table:

CREATE TABLE landcover (
	id Serial,
	state_abbrev varchar(10),
  	variable varchar(50),
  	value float,
  	PRIMARY KEY (id)
);

In [10]:
# pull in landcover info here - Greg sourced from (website here)
landcover2 = pd.read_csv('Resources/Raw_Data/LAND_COVER_14032023014858227.csv')
landcover2.head()

Unnamed: 0,COU,Country,SMALL_SUBNATIONAL_REGION,Small subnational region,LARGE_SUBNATIONAL_REGION,Large subnational region,MEAS,Measure,VARIABLE,Land cover class,...,Year,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,Value,Flag Codes,Flags
0,AUS,Australia,,Not applicable,GAUL1_467_2015,Administrative unit not available,THOUSAND_SQKM,Square kilometers (000's),CROPL,Cropland,...,1992,KM2,Square kilometres,3,Thousands,,,0.0,,
1,AUS,Australia,,Not applicable,GAUL1_467_2015,Administrative unit not available,THOUSAND_SQKM,Square kilometers (000's),CROPL,Cropland,...,2004,KM2,Square kilometres,3,Thousands,,,0.0,,
2,AUS,Australia,,Not applicable,GAUL1_467_2015,Administrative unit not available,THOUSAND_SQKM,Square kilometers (000's),CROPL,Cropland,...,2015,KM2,Square kilometres,3,Thousands,,,0.0,,
3,AUS,Australia,,Not applicable,GAUL1_467_2015,Administrative unit not available,THOUSAND_SQKM,Square kilometers (000's),CROPL,Cropland,...,2018,KM2,Square kilometres,3,Thousands,,,0.0,,
4,AUS,Australia,,Not applicable,GAUL1_467_2015,Administrative unit not available,THOUSAND_SQKM,Square kilometers (000's),CROPL,Cropland,...,2019,KM2,Square kilometres,3,Thousands,,,0.0,,


In [11]:
# Filtering to USA, year 2019 - Greg, rename and replace - Chris
landcover_US_2019 = landcover2.loc[(landcover2['COU'] == 'USA') & (landcover2['Year'] == 2019)].copy()
landcover_US_2019.rename({'Large subnational region':'state'},axis=1, inplace = True)
landcover_US_2019.rename({'VARIABLE':'variable'},axis=1, inplace = True)
landcover_US_2019.rename({'Value':'value'},axis=1, inplace = True)
landcover_US_2019.replace({'District Of Columbia':'District of Columbia'}, inplace = True)
print(len(landcover_US_2019))
print(type(landcover_US_2019))
landcover_US_2019.head(2)


918
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,COU,Country,SMALL_SUBNATIONAL_REGION,Small subnational region,LARGE_SUBNATIONAL_REGION,state,MEAS,Measure,variable,Land cover class,...,Year,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,value,Flag Codes,Flags
220477,USA,United States,,Not applicable,OECD_US01_2021,Alabama,THOUSAND_SQKM,Square kilometers (000's),FOREST,Tree cover,...,2019,KM2,Square kilometres,3,Thousands,,,87.106024,,
220482,USA,United States,,Not applicable,OECD_US01_2021,Alabama,THOUSAND_SQKM,Square kilometers (000's),GRSL,Grassland,...,2019,KM2,Square kilometres,3,Thousands,,,26.72643,,


In [12]:
# bring in state abbreviation by joining to state_fips
landcover_US_2019_2 = pd.merge(landcover_US_2019,state_fips)
print(len(landcover_US_2019_2))
print(type(landcover_US_2019_2))
landcover_US_2019_2.head(2)

918
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,COU,Country,SMALL_SUBNATIONAL_REGION,Small subnational region,LARGE_SUBNATIONAL_REGION,state,MEAS,Measure,variable,Land cover class,...,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,value,Flag Codes,Flags,state_abbrev,FIPS
0,USA,United States,,Not applicable,OECD_US01_2021,Alabama,THOUSAND_SQKM,Square kilometers (000's),FOREST,Tree cover,...,Square kilometres,3,Thousands,,,87.106024,,,AL,1
1,USA,United States,,Not applicable,OECD_US01_2021,Alabama,THOUSAND_SQKM,Square kilometers (000's),GRSL,Grassland,...,Square kilometres,3,Thousands,,,26.72643,,,AL,1


In [13]:
# for Chris evaluation - reduce land cover fips area dataset to measurements by percent (rather than by square kilometers)
landcover_US_2019_3 = landcover_US_2019_2.loc[landcover_US_2019_2['MEAS'] == 'PCNT'] 
print(len(landcover_US_2019_3))
landcover_US_2019_3.head(2)

459


Unnamed: 0,COU,Country,SMALL_SUBNATIONAL_REGION,Small subnational region,LARGE_SUBNATIONAL_REGION,state,MEAS,Measure,variable,Land cover class,...,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,value,Flag Codes,Flags,state_abbrev,FIPS
9,USA,United States,,Not applicable,OECD_US01_2021,Alabama,PCNT,Percent of total country area,FOREST,Tree cover,...,Percentage,0,Units,,,64.872148,,,AL,1
10,USA,United States,,Not applicable,OECD_US01_2021,Alabama,PCNT,Percent of total country area,GRSL,Grassland,...,Percentage,0,Units,,,19.90449,,,AL,1


In [14]:
#  create landcover table
landcover = landcover_US_2019_3[['state_abbrev','variable','value']]
print(len(landcover))
print(type(landcover))
landcover.head(2)

459
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,state_abbrev,variable,value
9,AL,FOREST,64.872148
10,AL,GRSL,19.90449


In [15]:
# write landcover to csv  - csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
landcover.to_csv('Resources/DB_CSV/landcover.csv', index = False)

In [16]:
# check landcover.csv - csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
landcover_check = pd.read_csv('Resources/DB_CSV/landcover.csv')
print(len(landcover_check))
print(landcover_check.dtypes)
landcover_check


459
state_abbrev     object
variable         object
value           float64
dtype: object


Unnamed: 0,state_abbrev,variable,value
0,AL,FOREST,64.872148
1,AL,GRSL,19.904490
2,AL,WETL,0.176988
3,AL,SHRUBL,1.222573
4,AL,SPARSE_VEGETATION,0.034765
...,...,...,...
454,VA,SPARSE_VEGETATION,0.025975
455,VA,CROPL,10.927344
456,VA,URBAN,2.967337
457,VA,BARE,0.195108


* Landcover table schema and csv are compatible
* Use landcover.csv to populate landcover db table

## meteorite_main table

db table:

CREATE TABLE meteorite_main (
  	id integer,
  	name VARCHAR(50),
 	recclass VARCHAR(50),
  	mass_grams float,
	fall VARCHAR(10),
  	year DATE,
	reclat float,
	reclong float,
	geolocation varchar(50),
	geometry varchar(50),
	elevation float,
	country varchar(50),
	state_abbrev varchar(10),
  	PRIMARY KEY (id)
);


In [19]:
# pull in meteorite found fell info here - Madina sourced from (website here) and cleaned to create Meteorite_Landings_clean.csv
meteorite10 = pd.read_csv('Resources/Raw_Data/Meteorite_Landings_clean.csv')
print(len(meteorite10))
print(meteorite10.dtypes)
meteorite10.head()


38114
Unnamed: 0       int64
name            object
id               int64
nametype        object
recclass        object
mass (g)       float64
fall            object
year             int64
reclat         float64
reclong        float64
GeoLocation     object
geometry        object
dtype: object


Unnamed: 0.1,Unnamed: 0,name,id,nametype,recclass,mass (g),fall,year,reclat,reclong,GeoLocation,geometry
0,0,Aachen,1,Valid,L5,21.0,Fell,1880,50.775,6.08333,"(50.775, 6.08333)",POINT (6.08333 50.775)
1,1,Aarhus,2,Valid,H6,720.0,Fell,1951,56.18333,10.23333,"(56.18333, 10.23333)",POINT (10.23333 56.18333)
2,2,Abee,6,Valid,EH4,107000.0,Fell,1952,54.21667,-113.0,"(54.21667, -113.0)",POINT (-113 54.21667)
3,3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976,16.88333,-99.9,"(16.88333, -99.9)",POINT (-99.9 16.88333)
4,4,Achiras,370,Valid,L6,780.0,Fell,1902,-33.16667,-64.95,"(-33.16667, -64.95)",POINT (-64.95 -33.16667)


In [20]:
# add columns to meteorite10 df, drop unnamed col, rename cols
meteorite10['state_abbrev'] = 'None_Found'
meteorite10['state'] = 'None_Found'
meteorite10['county'] = 'None_Found'
meteorite10['country'] = 'None_Found'
meteorite10['elevation'] = 0.0
meteorite10.drop(['Unnamed: 0'], axis = 1, inplace = True)
meteorite10.rename({'mass (g)':'mass_grams',"GeoLocation":'geolocation'}, axis = 1, inplace = True)
meteorite10.head(2)

Unnamed: 0,name,id,nametype,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,state_abbrev,state,county,country,elevation
0,Aachen,1,Valid,L5,21.0,Fell,1880,50.775,6.08333,"(50.775, 6.08333)",POINT (6.08333 50.775),None_Found,None_Found,None_Found,None_Found,0.0
1,Aarhus,2,Valid,H6,720.0,Fell,1951,56.18333,10.23333,"(56.18333, 10.23333)",POINT (10.23333 56.18333),None_Found,None_Found,None_Found,None_Found,0.0


In [21]:
# reduce table size to approximate us lat/long
meteorite9 = meteorite10.loc[(meteorite10['reclat']>18)&(meteorite10['reclat']<72)&(meteorite10['reclong']>-170)&(meteorite10['reclong']<-67)].copy()
print(len(meteorite9))
print(meteorite9.dtypes)
meteorite9.head(2)

1799
name             object
id                int64
nametype         object
recclass         object
mass_grams      float64
fall             object
year              int64
reclat          float64
reclong         float64
geolocation      object
geometry         object
state_abbrev     object
state            object
county           object
country          object
elevation       float64
dtype: object


Unnamed: 0,name,id,nametype,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,state_abbrev,state,county,country,elevation
2,Abee,6,Valid,EH4,107000.0,Fell,1952,54.21667,-113.0,"(54.21667, -113.0)",POINT (-113 54.21667),None_Found,None_Found,None_Found,None_Found,0.0
27,Allegan,2276,Valid,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),None_Found,None_Found,None_Found,None_Found,0.0


In [22]:
# check lat longs on map - includes alaska, hawaii but also has canada and central american locations
# Create a scatter map using the 'reclat' and 'reclong' columns of the dataframe as latitude and longitude, respectively
fig = px.scatter_mapbox(meteorite9, lat="reclat", lon="reclong")

# customize the layout of the Map
fig.update_layout(
    mapbox_style="open-street-map",
    mapbox=dict(center=dict(lat=0, lon=0), zoom=0),
    margin=dict(l=0, r=0, t=0, b=0),
    width=600, height=600
)

# Display the map
fig.show()

In [23]:
# get state/county info
# https://geopy.readthedocs.io/en/stable/
# https://www.geeksforgeeks.org/get-the-city-state-and-country-names-from-latitude-and-longitude-using-python/

#  initialize Nominatim API
geolocator = Nominatim(user_agent="geoapiExercises")

for index, row in meteorite9.iterrows():
    # get latitude, longitude from the DataFrame

    rowlat = str(row['reclat'])
    rowlon = str(row['reclong'])
    rowloc = geolocator.reverse(rowlat+","+rowlon)
    rowaddress = rowloc.raw['address']
    rowstate = rowaddress.get('state','')
    rowcounty = rowaddress.get('county','')
    rowcountry = rowaddress.get('country','')

    
    try:
        meteorite9.loc[index, "county"] = rowcounty
        meteorite9.loc[index, "state"] = rowstate
        meteorite9.loc[index, "country"] = rowcountry
    except (KeyError, IndexError):
        # If no county is found, set the county as no county found".
        meteorite9.loc[index, "county"] = "None_Found"
        meteorite9.loc[index, "state"] = "None_Found"
        meteorite9.loc[index, "country"] = "None_Found"


In [24]:
# write landcover to csv  - interim csvs available for troubleshooting, uncomment if needed.
meteorite9.to_csv('Resources/Raw_Data/meteorite9_interim.csv', index = False)

##### Max retries exceeded - revert to reading info from previously created csv "meteorite9_interim.csv"

In [25]:
meteorite9 = pd.read_csv('Resources/Raw_Data/meteorite9_interim.csv')

In [26]:
print(len(meteorite9))
print(meteorite9.dtypes)
meteorite9.head(2)

1799
name             object
id                int64
nametype         object
recclass         object
mass_grams      float64
fall             object
year              int64
reclat          float64
reclong         float64
geolocation      object
geometry         object
state_abbrev     object
state            object
county           object
country          object
elevation       float64
dtype: object


Unnamed: 0,name,id,nametype,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,state_abbrev,state,county,country,elevation
0,Abee,6,Valid,EH4,107000.0,Fell,1952,54.21667,-113.0,"(54.21667, -113.0)",POINT (-113 54.21667),None_Found,Alberta,Thorhild County,Canada,0.0
1,Allegan,2276,Valid,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),None_Found,Michigan,Allegan County,United States,0.0


In [27]:
# bring in state abbreviation by joining to state_fips
meteorite8 = pd.merge(meteorite9,state_fips, on = 'state')
print(len(meteorite8))
print(type(meteorite8))
meteorite8.head(2)

1650
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,name,id,nametype,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,state_abbrev_x,state,county,country,elevation,state_abbrev_y,FIPS
0,Allegan,2276,Valid,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),None_Found,Michigan,Allegan County,United States,0.0,MI,26
1,Coleman,5401,Valid,L6,469.0,Fell,1994,43.76111,-84.50778,"(43.76111, -84.50778)",POINT (-84.50778 43.76111),None_Found,Michigan,Midland County,United States,0.0,MI,26


In [28]:
# meteorite8 only contains US values after merging with state_fips
meteorite8['country'].value_counts()

United States    1650
Name: country, dtype: int64

In [29]:
# Create a scatter map using the 'reclat' and 'reclong' columns of the dataframe as latitude and longitude, respectively
fig = px.scatter_mapbox(meteorite8, lat="reclat", lon="reclong")

# customize the layout of the Map
fig.update_layout(
    mapbox_style="open-street-map",
    mapbox=dict(center=dict(lat=0, lon=0), zoom=0),
    margin=dict(l=0, r=0, t=0, b=0),
    width=600, height=600
)

# Display the map
fig.show()

In [30]:
# get list of latitudes
qlats = meteorite8['reclat'].to_list()
print(type(qlats))
qlats

<class 'list'>


[42.53333,
 43.76111,
 44.51667,
 42.38467,
 42.96667,
 46.07944000000001,
 44.64694,
 43.86667,
 41.78333,
 44.61667,
 44.38333,
 44.08333,
 44.36667,
 43.827,
 38.5,
 36.75,
 37.26667,
 39.08333,
 37.91667,
 39.8,
 38.7,
 38.68333,
 37.06667,
 38.3,
 40.266667,
 39.59167,
 39.61667,
 38.65,
 36.82444,
 36.53333,
 40.2875,
 36.55,
 37.73333,
 37.2375,
 37.75,
 37.96667,
 31.805,
 33.6,
 31.83333,
 30.83333,
 32.675,
 29.45,
 30.75,
 31.60833,
 30.125,
 30.7,
 31.25,
 32.16667,
 33.85,
 32.59033,
 35.15,
 34.2,
 33.1,
 33.1,
 33.7825,
 32.9,
 31.66,
 31.76667,
 30.78333,
 30.83333,
 34.0,
 33.58333,
 33.59167,
 29.87861,
 29.86306,
 29.8,
 31.05,
 34.35,
 33.28333,
 33.275,
 33.275,
 33.21667,
 33.21667,
 33.135,
 33.21667,
 32.246109999999994,
 32.03333,
 28.41667,
 34.96667,
 31.91667,
 29.82806,
 34.2,
 36.2,
 35.71667,
 29.0,
 34.02806,
 34.905,
 34.93611,
 34.34667,
 34.34833,
 32.31667,
 32.01667,
 31.991670000000006,
 32.0,
 31.75,
 29.01667,
 36.04333,
 30.75,
 29.1,
 29.00833,

In [31]:
# get a list of longitudes
qlongs = meteorite8['reclong'].to_list()
print(type(qlongs))
qlongs

<class 'list'>


[-85.88333,
 -84.50778000000003,
 -83.95,
 -83.6115,
 -85.76666999999998,
 -88.55972,
 -85.13667,
 -85.51666999999998,
 -84.18333,
 -70.75,
 -68.75,
 -69.48333000000001,
 -69.2,
 -70.24717,
 -94.3,
 -93.5,
 -89.58333,
 -94.4,
 -92.08333,
 -91.5,
 -90.23333,
 -91.15,
 -93.55,
 -94.36667,
 -94.683333,
 -94.92194,
 -94.86667,
 -94.33333,
 -93.76111,
 -91.8,
 -95.37667,
 -93.1,
 -89.85,
 -92.78694,
 -90.5,
 -90.31667,
 -97.01,
 -96.46667,
 -98.83333,
 -97.76667,
 -94.51167,
 -96.0,
 -95.95,
 -102.85833,
 -103.11667,
 -96.11667,
 -99.03333,
 -95.1,
 -101.8,
 -101.77217,
 -102.71667,
 -101.5,
 -96.7,
 -102.95,
 -102.18111,
 -102.28333,
 -96.97167,
 -99.98333,
 -103.46667,
 -97.5,
 -101.5,
 -99.8,
 -103.02667,
 -96.87222,
 -96.92667,
 -98.8,
 -99.46667,
 -101.4,
 -102.96667,
 -102.96,
 -102.96,
 -102.18333,
 -102.18333,
 -102.27833,
 -102.18333,
 -99.99306,
 -99.25,
 -98.25,
 -101.93333,
 -98.03333,
 -96.90861,
 -100.48333,
 -102.45,
 -102.28333,
 -103.25,
 -102.68,
 -100.91333,
 -100.94167,


In [32]:
# get elevation in location dict using lat/long
# Dependencies
import requests

# url = f"https://api.opentopodata.org/v1/etopo1?locations={qlat},{qlong}"

qlatlong = []

for x in range(0,len(qlats)):
    strlalo= f'{qlats[x]},{qlongs[x]}'
    qlatlong.append(strlalo)
    print(strlalo)
    type(strlalo)

print(qlatlong)


responses = []

for qlat in qlatlong:
    print(qlat)
    url = f"https://api.opentopodata.org/v1/etopo1?locations={qlat}"
    location = requests.get(url).json()
    responses.append(location)
    print(location)

print(responses)

42.53333,-85.88333
43.76111,-84.50778000000003
44.51667,-83.95
42.38467,-83.6115
42.96667,-85.76666999999998
46.07944000000001,-88.55972
44.64694,-85.13667
43.86667,-85.51666999999998
41.78333,-84.18333
44.61667,-70.75
44.38333,-68.75
44.08333,-69.48333000000001
44.36667,-69.2
43.827,-70.24717
38.5,-94.3
36.75,-93.5
37.26667,-89.58333
39.08333,-94.4
37.91667,-92.08333
39.8,-91.5
38.7,-90.23333
38.68333,-91.15
37.06667,-93.55
38.3,-94.36667
40.266667,-94.683333
39.59167,-94.92194
39.61667,-94.86667
38.65,-94.33333
36.82444,-93.76111
36.53333,-91.8
40.2875,-95.37667
36.55,-93.1
37.73333,-89.85
37.2375,-92.78694
37.75,-90.5
37.96667,-90.31667
31.805,-97.01
33.6,-96.46667
31.83333,-98.83333
30.83333,-97.76667
32.675,-94.51167
29.45,-96.0
30.75,-95.95
31.60833,-102.85833
30.125,-103.11667
30.7,-96.11667
31.25,-99.03333
32.16667,-95.1
33.85,-101.8
32.59033,-101.77217
35.15,-102.71667
34.2,-101.5
33.1,-96.7
33.1,-102.95
33.7825,-102.18111
32.9,-102.28333
31.66,-96.97167
31.76667,-99.98333
30.

In [33]:
# pull elevations from location dict
elevations = []

for ellie in range (0,len(responses)):
    elevations.append(responses[ellie]['results'][0]['elevation'])

elevations


[225.0,
 218.0,
 387.0,
 279.0,
 225.0,
 510.0,
 355.0,
 329.0,
 243.0,
 220.0,
 25.0,
 44.0,
 76.0,
 68.0,
 264.0,
 294.0,
 123.0,
 292.0,
 281.0,
 179.0,
 147.0,
 173.0,
 416.0,
 257.0,
 318.0,
 339.0,
 329.0,
 295.0,
 391.0,
 269.0,
 305.0,
 270.0,
 162.0,
 481.0,
 296.0,
 234.0,
 170.0,
 216.0,
 534.0,
 310.0,
 93.0,
 31.0,
 96.0,
 812.0,
 1220.0,
 85.0,
 497.0,
 143.0,
 1016.0,
 868.0,
 1268.0,
 1005.0,
 200.0,
 1126.0,
 1037.0,
 962.0,
 163.0,
 526.0,
 1078.0,
 212.0,
 982.0,
 449.0,
 1208.0,
 122.0,
 96.0,
 533.0,
 590.0,
 1001.0,
 1162.0,
 1152.0,
 1152.0,
 989.0,
 989.0,
 1006.0,
 989.0,
 681.0,
 486.0,
 47.0,
 1075.0,
 339.0,
 118.0,
 602.0,
 1230.0,
 1147.0,
 624.0,
 1185.0,
 873.0,
 868.0,
 1010.0,
 1035.0,
 243.0,
 423.0,
 407.0,
 784.0,
 299.0,
 80.0,
 1203.0,
 1687.0,
 149.0,
 119.0,
 1127.0,
 1127.0,
 288.0,
 167.0,
 1094.0,
 131.0,
 131.0,
 234.0,
 1154.0,
 1162.0,
 952.0,
 1094.0,
 1086.0,
 623.0,
 207.0,
 1057.0,
 1057.0,
 1217.0,
 797.0,
 1011.0,
 1086.0,
 1059.0,
 

In [34]:
# add elevation to meteorite8
meteorite8['elevation'] = elevations
print(len(meteorite8))
print(type(meteorite8))
meteorite8.head(2)

1650
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,name,id,nametype,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,state_abbrev_x,state,county,country,elevation,state_abbrev_y,FIPS
0,Allegan,2276,Valid,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),None_Found,Michigan,Allegan County,United States,225.0,MI,26
1,Coleman,5401,Valid,L6,469.0,Fell,1994,43.76111,-84.50778,"(43.76111, -84.50778)",POINT (-84.50778 43.76111),None_Found,Michigan,Midland County,United States,218.0,MI,26


In [35]:
meteorite8.rename({'state_abbrev_y':'state_abbrev'},axis=1, inplace = True)

In [40]:
# save US data with county info to csv - interim csvs available for troubleshooting, uncomment if needed.
meteorite8.to_csv('Resources/Raw_Data/meteorite8_interim.csv')

In [37]:
# create meteorite_main
meteorite = meteorite8[['id','name','recclass','mass_grams','fall','year','reclat','reclong','geolocation','geometry','elevation','country','state_abbrev']]
print(len(meteorite))
print(type(meteorite))
meteorite.head(2)

1650
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,id,name,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,elevation,country,state_abbrev
0,2276,Allegan,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),225.0,United States,MI
1,5401,Coleman,L6,469.0,Fell,1994,43.76111,-84.50778,"(43.76111, -84.50778)",POINT (-84.50778 43.76111),218.0,United States,MI


In [38]:
# save US data with county info to csv - csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
meteorite.to_csv('Resources/DB_CSV/meteorite_main.csv', index = False)

In [39]:
# check landcover.csv - csv already created, commented out so it doesn't get overwritten. Uncomment if needed.
meteorite_main_check = pd.read_csv('Resources/DB_CSV/meteorite_main.csv')
print(len(meteorite_main_check))
print(meteorite_main_check.dtypes)
meteorite_main_check.head()


1650
id                int64
name             object
recclass         object
mass_grams      float64
fall             object
year              int64
reclat          float64
reclong         float64
geolocation      object
geometry         object
elevation       float64
country          object
state_abbrev     object
dtype: object


Unnamed: 0,id,name,recclass,mass_grams,fall,year,reclat,reclong,geolocation,geometry,elevation,country,state_abbrev
0,2276,Allegan,H5,32000.0,Fell,1899,42.53333,-85.88333,"(42.53333, -85.88333)",POINT (-85.88333 42.53333),225.0,United States,MI
1,5401,Coleman,L6,469.0,Fell,1994,43.76111,-84.50778,"(43.76111, -84.50778)",POINT (-84.50778 43.76111),218.0,United States,MI
2,22766,Rose City,H5,10600.0,Fell,1921,44.51667,-83.95,"(44.51667, -83.95)",POINT (-83.95 44.51667),387.0,United States,MI
3,24337,Worden,L5,1551.0,Fell,1997,42.38467,-83.6115,"(42.38467, -83.6115)",POINT (-83.6115 42.38467),279.0,United States,MI
4,10955,Grand Rapids,"Iron, ungrouped",51700.0,Found,1883,42.96667,-85.76667,"(42.96667, -85.76667)",POINT (-85.76667 42.96667),225.0,United States,MI


* meteorite_main table schema and csv are compatible
* Use meteorite_main.csv to populate meteorite_main db table