# Meet the Course Applicants

**Goal:** To get a better sense of what kind of people sign up for Applied Data Science Lab — where they're from, how old are they, what have they previously studied, and more.

- Extract and transform applicant demographic information using PyMongo.
- Enrich demographic information using an open-source library.
- Create a choropleth map to visualize nationality.
- Build a sorting function to visualize education level.

In [7]:
from pprint import PrettyPrinter

import pandas as pd
import plotly.express as px
from country_converter import CountryConverter
from pymongo import MongoClient

Instantiate PrettyPrinter

In [9]:
pp = PrettyPrinter(indent=2)
print("pp type:", type(pp))

pp type: <class 'pprint.PrettyPrinter'>


Connect to MongoDB Client

In [10]:
client = MongoClient(host="localhost",port=27017)
print("client type:", type(client))

client type: <class 'pymongo.mongo_client.MongoClient'>


Explore

Country Converter: Open-Source Software

In [11]:
df_nationality = pd.read_csv("data/df_nationality.csv")
df_nationality.head()

Unnamed: 0.1,Unnamed: 0,country_iso2,count
0,138,DM,1
1,51,BA,1
2,54,MO,1
3,35,CR,1
4,74,PT,1


In [14]:
cc = CountryConverter()
df_nationality["country_name"] = cc.convert(df_nationality["country_iso2"], to="name_short")

print("df_nationality shape:", df_nationality.shape)
df_nationality.head()

nan not found in ISO3


df_nationality shape: (139, 4)


Unnamed: 0.1,Unnamed: 0,country_iso2,count,country_name
0,138,DM,1,Dominica
1,51,BA,1,Bosnia and Herzegovina
2,54,MO,1,Macau
3,35,CR,1,Costa Rica
4,74,PT,1,Portugal


Applicants Nationality Bar Chart

In [20]:
# Create horizontal bar chart
fig = px.bar(
    data_frame=df_nationality.tail(10),
    x="count",
    y="country_name",
    orientation="h",
    title="Applicants by Country"
)
# Set axis labels
fig.update_layout(xaxis_title="Frequency [count]", yaxis_title="County")
# fig.show("png")
fig.show()

Normalize Nationality

In [19]:
df_nationality["count_pct"] = (df_nationality["count"]/df_nationality["count"].count()) * 100

print("df_nationality shape:", df_nationality.shape)
df_nationality.head()

df_nationality shape: (139, 5)


Unnamed: 0.1,Unnamed: 0,country_iso2,count,country_name,count_pct
0,138,DM,1,Dominica,0.719424
1,51,BA,1,Bosnia and Herzegovina,0.719424
2,54,MO,1,Macau,0.719424
3,35,CR,1,Costa Rica,0.719424
4,74,PT,1,Portugal,0.719424


Normalized Nationality Bar Chart

In [23]:
# Create horizontal bar chart
fig = px.bar(
    data_frame=df_nationality.tail(10),
    x="count_pct",
    y="country_name",
    orientation="h",
    title="DS Applicants by Country"
)
# Set axis labels
fig.update_layout(xaxis_title="Frequency [%]", yaxis_title="County")
# fig.show("png")
fig.show()

Country Converter: Take Two

In [24]:
df_nationality["country_iso3"] = cc.convert(df_nationality["country_iso2"], to="ISO3")

print("df_nationality shape:", df_nationality.shape)
df_nationality.head()

nan not found in ISO3


df_nationality shape: (139, 6)


Unnamed: 0.1,Unnamed: 0,country_iso2,count,country_name,count_pct,country_iso3
0,138,DM,1,Dominica,0.719424,DMA
1,51,BA,1,Bosnia and Herzegovina,0.719424,BIH
2,54,MO,1,Macau,0.719424,MAC
3,35,CR,1,Costa Rica,0.719424,CRI
4,74,PT,1,Portugal,0.719424,PRT
