## Importing Libraries

Okay, now that we have all of that out of the way, we will do some data stuff. The first thing we need to do is import some libraries. We will use the popular Pandas library, and also import Seaborn for doing some basic data visaulizations. These libraries were installed prior to this lesson, so you don't have to install them. But if you were doing this on your own, you'd need to install them before running the code below.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Now that we've imported our libraries, let's bring in our .csv file using pandas. Notice above, our Pandas library was stored in a variable called "pd" (short for Pandas). Technically, we could have specified anything for our Pandas library, but pd is the standard convention, so let's stick with that.

Let's load in some data. We're going to load in a [dataset from the New York Times](https://github.com/nytimes/hhs-child-migrant-data) on all of the unaccompanied minors who enter the United States since 2015. It turns out that a significant number of immigrants, 

The reporter, Hannah Dreier, won a Pulitzer Prize for her work on this issue. This is the highest prize one can earn in journalism. [Read more about this issue here](https://www.pulitzer.org/winners/hannah-dreier-new-york-times). 

The **sponsor category** column is a number defined as:

1. parent or legal guardian;
2. immediate relative;
3. distant relative or unrelated adult individual.

In [None]:
df = pd.read_csv("child-immigrant-data.csv", dtype={"Sponsor Zipcode":str})

Now, let's take a look and see what it looks like. We can run the command `head()` on the variable to see the first five rows of data.

In [None]:
df.head()

We can also put a number in the parenthesis to specify a specific number of rows to show.

In [None]:
df.head(10)

In [None]:
df.tail()

In [None]:
#describes the number of rows and columns
df.shape

### Interview Functions

Let's interview our data. Here are some functions we can run. The "df" is a placeholder for your data. It's stands for "data frame" which is what we call this datatype when using Pandas.

`df.head()` - get the first 5 rows of your data (or specify number)

`df.tail()` - get the last 5 rows of your data (or specify number)

`df.sample(5)` - get a random sampling of 5 rows of your data

`df.columns` - get a list of all the columns

`df.info()` - get number of rows with data and data type for each column 

`df.shape` - get the number of rows and columns

`df.describe()` - get a variety of statistical calculations for all values in each column

Let's take these functions for a spin:

In [None]:
df.columns

In [None]:
df.describe()

In [None]:
df[["Child's Country of Origin"]]

In [None]:
df[["Length of Detention"]].median()

In [None]:
df[["Length of Detention"]].mean()

In [None]:
df[["Length of Detention"]].max()

In [None]:
df[["Length of Detention"]].min()

In [None]:
df["Child's Gender"].value_counts()

In [None]:
df["Child's Country of Origin"].value_counts()

In [None]:
df["Child's Country of Origin"].value_counts().plot.barh()

In [None]:
df["Child's Country of Origin"].value_counts()[0:10]

In [None]:
ten_highest = df["Child's Country of Origin"].value_counts()[0:10]
ten_highest.plot.barh()

In [None]:
ten_highest.sort_values(ascending=True).plot.barh()

In [None]:
df["Length of Detention"].plot.hist()

In [None]:
df["Length of Detention"].plot.hist(bins=100)

In [None]:
df["Length of Detention"].describe()

In [None]:
df[df["Length of Detention"] == 1747]

In [None]:
df.groupby("Child's Date of Entry")["Child's Date of Entry"].value_counts().plot.line()

In [None]:
df["Child's Date of Entry"]   = pd.to_datetime(df["Child's Date of Entry"], format="%m/%d/%Y")
df["Child's Date of Release"] = pd.to_datetime(df["Child's Date of Release"], format="%m/%d/%Y")
df.head()

In [None]:
df.groupby("Child's Date of Entry")["Child's Date of Entry"].value_counts().plot.line()