# Subject: Data Science Foundation

## Session 14 - ArcGIS API for Python.

### Exercise 2 -  Descriptive Statistics using a HTML table to Pandas Data Frame to Portal Item

Let us read the Wikipedia article on List of countries by cigarette consumption per capita. 
This is a list of countries by annual per capita consumption of tobacco cigarettes. 
Explore the dataframe (descriptive statistics and correlation) and creates a map. 

https://en.wikipedia.org/wiki/List_of_countries_by_cigarette_consumption_per_capita

In [1]:
import pandas as pd

In [2]:
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_cigarette_consumption_per_capita")[0]

In [3]:
df.head()

Unnamed: 0,0,1,2
0,Ranking,Country/Territory,Number of cigarettes per person aged ≥ 15 per ...
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23


In [4]:
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))

In [5]:
df.head()

Unnamed: 0,Ranking,Country/Territory,Number of cigarettes per person aged ≥ 15 per year[7]
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


In [6]:
df.dtypes

0
Ranking                                                  object
Country/Territory                                        object
Number of cigarettes per person aged ≥ 15 per year[7]    object
dtype: object

In [7]:
df.shape

(182, 3)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 182 entries, 1 to 182
Data columns (total 3 columns):
Ranking                                                  182 non-null object
Country/Territory                                        182 non-null object
Number of cigarettes per person aged ≥ 15 per year[7]    182 non-null object
dtypes: object(3)
memory usage: 5.7+ KB


Lets check the data structure

In [23]:
df.rename(columns={'Ranking': 'Rank', 'Country/Territory': 'Country','Number of cigarettes per person aged ≥ 15 per year[7]': 'Ncpp'}, inplace=True)
df.head()

Unnamed: 0,Rank,Country,Ncpp
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


We need the "Number of cigarettes per person aged ≥ 15 per year[7]" column (Nrcigar_ppe) in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number.

In [24]:
converted_column = pd.to_numeric(df["Ncpp"], errors = 'coerce') # If ‘coerce’, then invalid parsing will be set as NaN.
df['Ncpp'] = converted_column
df.head()

Unnamed: 0,Rank,Country,Ncpp
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


In [25]:
converted_column = pd.to_numeric(df["Rank"], errors = 'coerce')
df['Rank'] = converted_column
df.head()

Unnamed: 0,Rank,Country,Ncpp
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


Lets find the ranking position of our Country

In [29]:
df.loc[df['Country'] == 'Spain']

Unnamed: 0,Rank,Country,Ncpp
47,47,Spain,1264.74


In [30]:
df.loc[df['Country'] == 'Philippines']

Unnamed: 0,Rank,Country,Ncpp
45,45,Philippines,1291.08


Lets check the descriptive statistics

In [28]:
df.describe()

Unnamed: 0,Rank,Ncpp
count,182.0,182.0
mean,91.5,818.75544
std,52.683014,757.071004
min,1.0,14.96
25%,46.25,213.755
50%,91.5,569.115
75%,136.75,1265.79
max,182.0,4124.53


In [33]:
df.drop(['Country'], axis=1).corr(method='spearman')

Unnamed: 0_level_0,Rank,Ncpp
0,Unnamed: 1_level_1,Unnamed: 2_level_1
Rank,1.0,-1.0
Ncpp,-1.0,1.0


## Plot as a map

Let us connect to our GIS to geocode this data and present it as a map

In [31]:
from arcgis.gis import GIS
import json

gis = GIS("https://www.arcgis.com", "rachelyap", "Avila9000")

In [32]:
# add feature layer

In [34]:
fc = gis.content.import_data(df,{"CountryCode": 'Country'})

In [36]:
map1 = gis.map('world')
map1

Let us us smart mapping to render the points with varying sizes representing the number of Number of cigarettes per person aged ≥ 15 per year

In [37]:
map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
               "field_name": "Ncpp"})

In [43]:
item_properties = {
    "title": "World Wide cigarette consumption per capita",
    "tags" : "cigarette",
    "snippet": "World Wide cigarette consumption per capita",
    "description":"2014 World wide Cigarette Consumption",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typekeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

In [44]:
search_result = gis.content.search("World Wide cigarette consumption per capita")
search_result[0]