<a href="https://colab.research.google.com/github/sakeefkarim/intro.python.24/blob/main/code/introduction_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An Introduction to `Python` for Social Research

[Sakeef M. Karim](https://www.sakeefkarim.com/)

sakeef.karim@nyu.edu

## Preliminaries

This notebook is designed to provide an introduction to select libraries in `Python` that are *essential* for data science—inclusive of [`pandas`](https://pandas.pydata.org/) (for data wrangling) and [`seaborn`](https://seaborn.pydata.org/) (for data visualization). More concretely, it will offer some basic code for (i) manipulating tabular data frames; and (ii) visualizing descriptive statistics or other quantities of interest. However, this notebook **will not** provide an exhaustive overview of the affordances of `Python` for research in the social and behavioural sciences, nor will it get into the weeds of [`scikit-learn`](https://scikit-learn.org/stable/) or other machine learning libraries that `Python` is known for (fear not: we'll delve into machine learning in a few short months).


## Loading Libraries

To kick things off, let's load our essential libraries.

In [None]:
# We use canonical naming conventions for our libraries and submodules:

import scipy as sp
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# An experimental submodule that brings the "grammar of graphics" into seaborn:

import seaborn.objects as so

### Note

If you're using [Google Colab](https://colab.google/), you can "mount" your Google Drive folders onto a Colab session to save plots, data sets and so on. To programmatically mount your Drive folder(s), run the following lines:

In [None]:
from google.colab import drive
drive.mount('/drive')

# Loading Data

To read in different _kinds_ of input data, we will use methods (i.e., functions) from the `pandas` library. For today's session, we will largely work with a dataset that we already encountered during the [<font face="Inconsolata" size=4.5> PopAgingDataViz</font>](https://popagingdataviz.com/) workshop —
 [gapminder](https://jennybc.github.io/gapminder/).


 ### Note

 If you're using Jupyter _locally_, you will want to (i) clone/download our companion GitHub repository, [`intro.python.24`](https://github.com/sakeefkarim/intro.python.24); (ii) _change your working directory_ (via the command line) so that it points to the `intro.python.24` folder you just cloned/downloaded.

 To implement (ii), feel free to un-annotate the code snippet below and run the cell — but **remember to adjust the code as needed** (depending on where `intro.python.24` is located on your computer).





In [None]:
# cd "THE PATH TO ... /intro.python.24"

### Excel and CSV Files

In [None]:
# From the companion GitHub repository:

gapminder_excel = pd.read_excel("https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.xlsx")

gapminder_csv = pd.read_csv("https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.csv")

# If you have the intro.python.24 folder on your machine:

# gapminder_excel = pd.read_excel("data/gapminder.xlsx")

# gapminder_csv = pd.read_csv("data/gapminder.csv")


### Stata Files

In [None]:
# From the companion GitHub repository:

gapminder_dta = pd.read_stata("https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.dta")

# If you have the intro.python.24 folder on your machine:

# gapminder_dta = pd.read_excel("data/gapminder.dta")


### SPSS Files

In [None]:
# If you want to read in SPSS files, you may want to launch this notebook locally.

# Then, make sure you have pyreadstat installed within your conda environment — say, by running

# conda install pyreadstat

# In your terminal. You can then run the code below:

# gapminder_spss = pd.read_spss("data/gapminder.sav")


### R Files

In [None]:
# In Google Colab:

!pip install pyreadr

# Locally, install pyreadr within your conda environment — e.g.,:

# conda install pyreadr

In [None]:
# Here, the alias is idiosyncratic (no canonical conventions for pyreadr)

import pyreadr as pyr

# Loading R files from GitHub

rds_url = "https://github.com/sakeefkarim/intro.python.24/raw/main/data/gapminder.rds"

destination = '/drive/My Drive/Colab/gapminder.rds'

pyr.download_file(rds_url, destination)

gapminder_r = pyr.read_r(destination)

# Checking to see which objects are available:

print(gapminder_r.keys())

# Only none, ergo:

gapminder_rds = gapminder_r[None]

# Importing R files locally:

# gapminder_rds = pyr.read_r(data/gapminder.rds)