<a href="https://colab.research.google.com/github/sakeefkarim/intro.python.24/blob/main/code/introduction_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An Introduction to `Python` for Social Research

[Sakeef M. Karim](https://www.sakeefkarim.com/)

sakeef.karim@nyu.edu

## Preliminaries

This notebook is designed to provide an introduction to select libraries in `Python` that are *essential* for data science—inclusive of [`pandas`](https://pandas.pydata.org/) (for data wrangling) and [`seaborn`](https://seaborn.pydata.org/) (for data visualization). More concretely, it will offer some basic code for (i) manipulating tabular data frames; and (ii) visualizing descriptive statistics or other quantities of interest. However, this notebook **will not** provide an exhaustive overview of the affordances of `Python` for research in the social and behavioural sciences, nor will it get into the weeds of [`scikit-learn`](https://scikit-learn.org/stable/) or other machine learning libraries that `Python` is known for (fear not: we'll delve into machine learning in a few short months).


## Loading Libraries

To kick things off, let's load our essential libraries.

In [2]:
# We use canonical naming conventions for our libraries and submodules:

import scipy as sp
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# An experimental submodule that brings the "grammar of graphics" into seaborn:

import seaborn.objects as so

### Note

If you're using [Google Colab](https://colab.google/), you can "mount" your Google Drive folders onto a Colab session to save plots, data sets and so on. To programmatically mount your Drive folder(s), run the following lines:

In [3]:
from google.colab import drive
drive.mount('/drive')

Mounted at /drive


# Loading Data

To read in different _kinds_ of input data, we will use methods (i.e., functions) from the `pandas` library. For today's session, we will largely work with a dataset that we already encountered during the [<font face="Inconsolata" size=4.5> PopAgingDataViz</font>](https://popagingdataviz.com/) workshop —
 [gapminder](https://jennybc.github.io/gapminder/).


 ### Note

 If you're using Jupyter _locally_, you will want to (i) clone/download our companion GitHub repository, [`intro.python.24`](https://github.com/sakeefkarim/intro.python.24); (ii) _change your working directory_ (via the command line) so that it points to the `intro.python.24` folder you just cloned/downloaded.





### Excel and CSV Files

In [14]:
# From the companion GitHub repository:

gapminder_excel = pd.read_excel("https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.xlsx")

gapminder_csv = pd.read_csv("https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.csv")

# If you have the intro.python.24 folder on your machine:

# gapminder_excel = pd.read_excel("./data/gapminder.xlsx")

# gapminder_csv = pd.read_csv("./data/gapminder.csv")


Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.853030
2,Afghanistan,Asia,1962,31.997,10267083,853.100710
3,Afghanistan,Asia,1967,34.020,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623
