## Creating data
There are two core objects in pandas:the **DataFrame** and the **Series**

**DataFrame**
A DataFrame is a table.It contains an array of individuals entries with values.Each entry corresponds to a row(records) and a column.

In [1]:
# DataFrame example
# importing pandas library
import pandas as pd
# dataframe function
pd.DataFrame({'Yes':['50','21'], 'No':['131','2']})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [2]:
# DataFrame with string values
pd.DataFrame({'Bob':['I liked it','It was awful'],'Sue':['Pretty good','Bland']})

Unnamed: 0,Bob,Sue
0,I liked it,Pretty good
1,It was awful,Bland


The list of row labels used in a DataFrame is known as index.We can assign values to it by using an index parameter in our constructor:

In [3]:
pd.DataFrame({'Bob':['I liked it','It was awful'],'Sue':['Pretty good','Bland']},
            index=['Product A','Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it,Pretty good
Product B,It was awful,Bland


# Series
A series is a sequence of data values.If a DataFrame is a table ,a Series is a list,you can create one with nothing more than list

In [4]:
# implenting series
pd.Series(['10','12','3','5'])

0    10
1    12
2     3
3     5
dtype: object

In [5]:
A series is a single column of DataFrame.Column values can be assigned to a series using inex parameter

SyntaxError: invalid syntax (<ipython-input-5-e3c3835f5bd7>, line 1)

In [None]:
pd.Series(['2016 sales','2015 sales'],name='Product A')

In [None]:
# reading data with unnamed column
unnamed=pd.read_csv('data/bigmart.csv', index_col=[0])
unnamed.head()

In [None]:
# saving a dataframe to csv
data=pd.read_csv('data/bigmart.csv')
data.to_csv('save.csv')

In [None]:
# indexing in pandas
#Using the loc and iloc function
# selecting the first row
data.iloc[0]

In [None]:
# getting a column with iloc
data.iloc[:,0]

In [None]:
# selecting the second and third entries
data.iloc[1:3,0]

In [None]:
# it is also possible to pass a list
data.iloc[[0,1,2],0]

In [None]:
# selecting the last five element
data.iloc[-5:]

# Label-based selection
Using Loc operator.For this paradigm,its data index value,not its position,which matters

In [None]:
# getting first entry in reviews
data.loc[0,'Item_Identifier']

In [None]:
# Listing specific columns
data.loc[:,['Item_Visibility','Item_MRP','Outlet_Size']]

# Manipulating the Index
Label based selection derives its power from the labels in the index.The index used is not immutable.

The set_index() can be used


In [None]:
data.set_index("Outlet_Establishment_Year")

## Conditional Selection

In [None]:
# Checking if item is diary or  not
data.Item_Type =='Dairy'

In [None]:
#The result above can be used inside loc
data.loc[data.Item_Type =='Dairy']

In [None]:
# select Item type for dairy and are from medium Outlet_Size
data.loc[(data.Item_Type == 'Dairy') & (data.Outlet_Size == 'Medium')]

In [None]:
# select Item type for dairy and are from medium Outlet_Size and Item weight is above 14
data.loc[(data.Item_Type == 'Dairy') & (data.Outlet_Size == 'Medium') & (data.Item_Weight >= 14)]

In [None]:
# Item type that is dairy or has more than 14 item weight
data.loc[(data.Item_Type == 'Dairy') | (data.Item_Weight >=14)]

Pandas comes with a few built in conditional selectors

A) isin - selects dta whose value is in a list of values

In [None]:
# select products from Medium and Small
data.loc[data.Outlet_Size.isin(['Medium','Small'])]

B) isnull-Helps highlight empty values which are  empty (NaN)

  notnull -Highlights values which are null.

In [None]:
# FIlter out item_weight which are null
data.loc[data.Item_Weight.isnull()]

In [None]:
# item sales greater than 6000
data.loc[data.Item_Outlet_Sales>=6000]


# Summary Functions and Maps

## Summary Functions
1. To see a list of unique functions

- reviews.taster_name.unique()

2. To see a list of unique values and how often they occur in the dataset

- review.taster_name.value_counts()

## Maps
Map is a function that takes one set of values and maps them to another set of values
map() is the first and slightly simpler one.

For example we wanted to remean the scores the wines received to 0.This can be done

In [8]:
# df_points_mean=df.points.mean()
# df.points.mao(lambda p:p - review_points_mean)

apply() is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row

In [9]:
# def remean_points(row):
#     row.points = row.pointsn- review_points_mean
#     return row
# reviews.apply(remean_points, axis='columns')

In [10]:
# # Combining two columns
# df.country + "-" + df.region_1