In this notebook I create a choropleth map that shows how many people responded to the [Stack Overflow Developer Survey 2018](https://insights.stackoverflow.com/survey/2018/) in relation to their countries' populations.

First load the result file into a pandas DataFrame and create a series of value counts from the `Country` column.

In [119]:
import pandas as pd

df_public = pd.read_csv('../input/survey_results_public.csv', low_memory=False)
df_countries = pd.DataFrame(df_public.Country.value_counts())
df_countries.head(5)

Next create a country index to map the country names used in the developer survey to the [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3) country codes that identify countries in the geographic data used later for plotting the map. Then show the names that could not be mapped with the ``iso3166`` package.

In [120]:
from iso3166 import countries

country_index = {name: countries.get(name).alpha3 for name in df_countries.index if name in countries}
set(df_countries.index) - set(country_index)

Add the missing ISO codes manually to the index.

In [130]:
from collections import Counter

country_index.update({
    'Bolivia': 'BOL',
    'Cape Verde': 'CPV',
    'Congo, Republic of the...': 'COG',
    'Czech Republic': 'CZE',
    "Democratic People's Republic of Korea": 'PRK',
    'Democratic Republic of the Congo': 'COD',
    'Hong Kong (S.A.R.)': 'HKG',
    'Iran, Islamic Republic of...': 'IRN',
    'Libyan Arab Jamahiriya': 'LBY',
    'Micronesia, Federated States of...': 'FSM',
    'North Korea': 'PRK',
    'Republic of Korea': 'KOR',
    'Republic of Moldova': 'MDA',
    'South Korea': 'KOR',
    'The former Yugoslav Republic of Macedonia': 'MKD',
    'United Kingdom': 'GBR',
    'United Republic of Tanzania': 'TZA',
    'Venezuela, Bolivarian Republic of...': 'VEN'
})

pd.Series(country_index).value_counts().head()

In the output above we see, that the two Koreas have two name to ISO code mappings in the index each. So next group by the ``iso`` column, summing up the respondent counts and show the top entries.

In [132]:
df_countries['iso'] = df_countries.index.map(lambda x: country_index.get(x))
iso_index = df_countries.groupby('iso').sum()
iso_index.sort_values('Country', ascending=False).head()

In the next cell we import [GeoPandas](http://geopandas.org/), create a GeoDataFrame containing data from [naturalearthdata.com](http://www.naturalearthdata.com/) and remove Antarctica so it doesn't take up unnecessary space. Then add columns containing the total number of respondents per country and the ratio of respondents to 1 million inhabitants.

In [None]:
import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')).to_crs('+proj=robin')
world = world[world.name != 'Antarctica']

world['respondents'] = world['iso_a3'].apply(lambda x: int(iso_index.loc[x]) if x in iso_index.index else None)
world['respondent_ratio'] = world['respondents'] / world['pop_est'] * 1_000_000
world.sort_values('respondent_ratio', ascending=False).head(10)

Now plot the map. We treat countries with and without data separately, add annotations and a legend, so the graphic can be interpreted without additional context.  See this notebook on [creating choropleth maps with GeoPandas](http://ramiro.org/notebook/geopandas-choropleth/) for more details.

In [153]:
known = world.dropna(subset=['respondent_ratio'])
unknown = world[world['respondent_ratio'].isna()]

ax = known.plot(column='respondent_ratio', cmap='viridis_r', figsize=(20, 12), scheme='fisher_jenks', k=7, legend=True, edgecolor='#aaaaaa')
unknown.plot(ax=ax, color='#ffffff', hatch='//', edgecolor='#aaaaaa')

ax.set_title('Stack Overflow Developer Survey 2018 Respondents per 1 Million People', fontdict={'fontsize': 20}, loc='left')
descripton = '''
Survey data: kaggle.com/stackoverflow/stack-overflow-2018-developer-survey • Population estimates: naturalearthdata.com • 
Source code: kaggle.com/ramirogomez/stack-overflow-survey-2018-respondents-world-map • Author: Ramiro Gómez - ramiro.org'''.strip()
ax.annotate(descripton, xy=(0.065, 0.12), size=12, xycoords='figure fraction')

ax.set_axis_off()
legend = ax.get_legend()
legend.set_bbox_to_anchor((.11, .4))
legend.prop.set_size(12)

## Conclusion

While the USA is by far the country with the most respondents, we see that Iceland, Estonia, Israel, Switzerland and New Zealand have the highest ratios of developer survey respondents in relation to population. A map just showing the total numbers would certainly look very different and tell a different story.