# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index.

In [None]:
import pandas as pd
import numpy as np

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
df = pd.read_excel('3_NEWS_Sales.xlsx',sheet_name='NEWS')

In [None]:
df

In [None]:
df.set_index('Month', inplace = True)
df

## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [None]:
df['North']

In [None]:
# Pass a list of column names
df[ ['North','South'] ]

In [None]:
df['Jan']

In [None]:
df.loc['Jan']

DataFrame Columns are just Series

In [None]:
type(df['North'])

In [None]:
type(df.loc['Jan'])

**Creating a new column:**

In [None]:
df['North_South'] = df['North'] + df['South']

In [None]:
df

** Removing Columns**

In [None]:
df.drop('North_South',axis=1)

In [None]:
# Not inplace unless specified!
df

In [None]:
df.drop('North_South',axis=1,inplace=True)

In [None]:
df

Can also drop rows this way:

In [None]:
df.drop('Jan',axis=0)

** Selecting Rows**

In [None]:
df.loc['Jan']

Or select based off of position instead of label 

In [None]:
df.iloc[0]

** Selecting subset of rows and columns **

In [None]:
df.loc['Mar','South']

In [None]:
df.loc[['Feb','Mar'],['East','West']]

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [None]:
df

In [None]:
df>5000

In [None]:
df[df>5000]

In [None]:
df[df['North']>8000]

In [None]:
df[df['North']>8000]['East']

In [None]:
df[df['North']>8000][['East','West']]

For two conditions you can use | and & with parenthesis:

In [None]:
df[(df['North']>8000) & (df['East'] > 5000)]

## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. Also, about index hierarchy!

In [None]:
df

In [None]:
# Reset to default 0,1...n index
df.reset_index()

In [None]:
newind = 'Jan19 Feb19 Mar19 Apr19 May19 Jun19 Jul19 Aug19 Sep19 Oct19 Nov19 Dec19'.split()

In [None]:
df['Year_19'] = newind

In [None]:
df

In [None]:
df.set_index('Year_19')

In [None]:
df

In [None]:
df.set_index('Year_19',inplace=True)

In [None]:
df

## Multi-Index and Index Hierarchy

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

In [None]:
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [None]:
hier_index

In [None]:
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Now let's show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:

In [None]:
df.loc['G1']

In [None]:
df.loc['G1'].loc[1]

In [None]:
df.index.names

In [None]:
df.index.names = ['Group','Num']

In [None]:
df

In [None]:
df.xs('G1')

In [None]:
df.xs(['G1',1])

In [None]:
df.xs(1,level='Num')