# Introduction to Pandas

This introduction is meant to give you a feel for how we will work with data in Pandas by introducing some of the basic Pandas functions.

We will learn cool stuff like:

*   List item 1
*   List item 2



# Imports

There are two Python packages that we need to import in order to run all of our functions.

## Pandas

Pandas is a powerful data analysis library. We can think of pandas as an Excel replacement. It allows us to view and explore datatables (called dataframes), manipulate data, calculate statistics, and prepare data for plotting.

## Numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Pandas uses numpy to hold our arrays of datas, so we can natively use numpy functions to manipulate data and make life easier.

In [None]:
import numpy as np
import pandas as pd

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
series = pd.Series([1, 3, 5, np.nan, 6, 8])
print(series)

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

In [None]:
# First create the dates index
dates = pd.date_range('20130101', periods=6)
print(dates)

In [None]:
# Now create a dataframe using the dates index
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)

Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [None]:
df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})
df2

The columns of the resulting DataFrame have different dtypes.

In [None]:
df2.dtypes

# View Data

Here is how to view the top and bottom rows of the frame:

In [None]:
# Show the first 5 rows
df.head()

In [None]:
# Show the last 3 rows
df.tail(3)

Display the index and column names:

In [None]:
# Show the index
df.index

In [None]:
# Show the columns
df.columns

DataFrame.to_numpy() gives a NumPy representation of the underlying data.

In [None]:
df.to_numpy()

In [None]:
df2.to_numpy()

`describe()` shows a quick statistic summary of your data:

In [None]:
df.describe()

Transposing your data:

In [None]:
df.T

Sorting by an axis:

In [None]:
df.sort_index(axis=1, ascending=False)

Sorting by values:

In [None]:
df.sort_values(by='B')

# Selecting Data

Selecting a single column, which yields a Series:

In [None]:
# First way to select a column
df['A']

In [None]:
# Second way to select a column
df.A

Selecting a row:

In [None]:
df.loc['2013-01-01']