**This notebook is an exercise in the [Pandas](https://www.kaggle.com/learn/pandas) course.  You can reference the tutorial at [this link](https://www.kaggle.com/residentmario/creating-reading-and-writing).**

---


In [None]:
# Importing Pandas
import pandas as pd

# Creating Data

There are two core objects in pandas: the DataFrame and the Series.

## 1. DataFrame

* A DataFrame is a table like MS EXCEL. 
* It contains an array of individual entries, each has a certain value. Each entry corresponds to a row (or record) and a column.
* We are using the pd.DataFrame() constructor to generate these DataFrame objects

In [None]:
# Simplest way without index lables
fruits = pd.DataFrame({'Apples':[30, 50, 40], 
                       'Bananas':[21, 10, 15]})

fruits

In [None]:
# Creating dataFrames with user defined indexes
fruits_1 = pd.DataFrame({'Apples':[30, 50, 40], 
                       'Bananas':[21, 10, 15]},
                       index=['January', 'February', 'March'])

fruits_1

In [None]:
# Another way to do above
fruits_2 = pd.DataFrame([[60, 35], [5, 15], [21, 36]], 
                      columns=['Apples', 'Bananas'],
                      index=['April', 'May', 'June'])
fruits_2

In [None]:
# Another way with simplified columns
items_AB = pd.DataFrame([[30, 21], [50, 10], [40, 15]], 
                      columns=list('AB'),
                      index=['January', 'February', 'March'])
items_AB

## Append rows

In [None]:
# Always create a variable to save the new dataframe after append()
fruits = fruits_1.append(fruits_2)
fruits

In [None]:
# Does not append the orginal dataframe
fruits_1

Error: In following cse, eventhough indexes are same, new columns will not append to the same position.

In [None]:
grocery = fruits_1.append(items_AB)
grocery

## Append Column

In [None]:
fruits

In [None]:
fruits['Orange'] = [10,56,78,23,78,36]
fruits

In [None]:
#insert() gives the ability to add column to any location
fruits.insert(0, 'Pineapple', [1,5,0,2,8,'NaN'])
fruits

## 2. Series

* A Series is a sequence of data values. 
* Consider, If a DataFrame is a table, a Series is a list. 
* You can also consider a series as a single column DataFrame
* DataFrame can be considered as a bunch of Series "glued together". 

In [None]:
# Simple pandas series by using a list
pd.Series([1981, 5.2, 'Ann', 20000])

In [None]:
# Index can be created simila to DataFrames
# Name can be allocated to entire series using 'name' attribute
pd.Series([1981, 5.2, 'Ann', 2000],
         index=['Birth Year', 'Height', 'First Name', 'Paid'],
         name='Customer Details')

# Reading Data Files

In [None]:
# Reading a CSV file
file_path = '../input/wine-reviews/winemag-data-130k-v2.csv'
wine_reviews = pd.read_csv(file_path)
wine_reviews

In [None]:
# Aks python to take the index file available in the CSV
# There are more than 30 parameters similat to 'index_col'
wine_reviews = pd.read_csv(file_path, index_col=0)
wine_reviews

## shape

Use 'shape' attribute to check the dimensions of the DataFrame (#rows, #columns)

In [None]:
# Remember not shape()
wine_reviews.shape

## head()

Use 'head()' method to check first few rows of the data. Required number of rows can be passed as a parameter

In [None]:
# Remember this is a method
wine_reviews.head(3)

## describe()
Summarize data with statistics. Describe() will automatically identify numeric data types when geenrating the summary (type aware)

In [None]:
wine_reviews.describe()

## count(), mean(), std(), min(), max()

Individual statistics can be recall as follows



In [None]:
wine_reviews['price'].mean()
#wine_reviews['price'].count()
# wine_reviews['price'].std()
# wine_reviews['price'].min()
# wine_reviews['price'].max()

## column()

List all coumns as a list

In [None]:
wine_reviews.columns

In [None]:
wine_reviews.columns[0]

## uniquie()
To see a list of unique values 

In [None]:
wine_reviews.country.unique()

## value_counts()
To see a list of unique values and how often they occur in the dataset

In [None]:
wine_reviews.country.value_counts()

# Data Access

In Python, we can access the property of an object by accessing it as an attribute.

## 1. Native selection

In [None]:
# Straight forward way
wine_reviews.country.head()

In [None]:
# Access similar to dictionary value
wine_reviews['country'].head()

Individual value can be accessible using the column name and index similar to following

In [None]:
# Access data using index
wine_reviews['country'][1]

**Pandas has its own access operators 'loc' and 'iloc'. Both loc and iloc ROW first, COLUMN second

## 2. iloc - index based selection

When we use iloc we treat the dataset like a big matrix (a list of lists)

In [None]:
# Selecting the 5th row
wine_reviews.iloc[5]

In [None]:
# Selecting fifth (index 4) column
wine_reviews.iloc[:, 4]

How to use : symbol to slice and dice
* : >> all
* :3 >> begining to 3
* 3: >> starting from item 3 till end
* -5: >> last 5 items

In [None]:
# Begining to 3rd row
wine_reviews.iloc[:3, [4,5]]

In [None]:
# You can pass a list of specific number of rows 
wine_reviews.iloc[[5,8,7,15],0]

## 3. loc - label based selection

 Since your dataset usually has meaningful indices, it's usually easier to do things using loc instead. 

In [None]:
wine_reviews.loc[5, ['country', 'province', 'region_1', 'region_2']]

Index is not immutable, i.e. we can manipulate it

In [None]:
# Set 'winery' as the new index
wine_reviews.set_index('winery')

Conditional selection

In [None]:
# AND (&) logical operator
wine_reviews.loc[(wine_reviews.country == 'Italy') & (wine_reviews.points >= 99)]

In [None]:
# OR (|) logical operator
wine_reviews.loc[(wine_reviews.country == 'Italy') | (wine_reviews.points >= 99)]

'isin([list])' to check agains a list

In [None]:
# Selection based on the ountry list ['Italy', 'France', 'Spain']
wine_reviews.loc[wine_reviews.country.isin(['Italy', 'France', 'Spain'])]

'isnull()' and 'notnull()' to find out null values and not null values

In [None]:
# Select the entries where price is null
wine_reviews.loc[wine_reviews.price.isnull()]

In [None]:
# Select the entries where price is NOT null
wine_reviews.loc[wine_reviews.price.notnull()]

# Assigning Data

## Replacing null values

In [None]:
# Assigning data is straight forward
wine_reviews.loc[wine_reviews.price.isnull()] = 0
wine_reviews.loc[wine_reviews.price.isnull()]

## Column calculations

In [None]:
price = wine_reviews.price
price

In [None]:
median = wine_reviews.price.median()
median

In [None]:
median_price = wine_reviews.price - wine_reviews.price.mean()
median_price

## Maps

Create new columns based on existing columns. For an example choosing wines based on the rating/price ratio