In [None]:
import pandas

### pandas.Series

A pandas Series can be created using the following constructor:

#### Parameter and Description

**data** : data takes various forms like ndarray, list, constants

**index** : Index values must be unique and hashable, same length as data. Default **np.arange(n)** if no index is passed.

**dtype** : dtype is for data type. If None, data type will be inferred

**copy** : Copy data. Default False

In [None]:
pandas.Series?

A series can be created using various inputs like:

- Array
- Dict
- Scalar value or constant

### Create an Empty Series

A basic series, which can be created is an Empty Series.

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print(s)

### Create a Series from ndarray

If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be **range(n)** where n is array length, i.e., **[0,1,2,3…. range(len(array))-1]**.

#### Example

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)

We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3.

In [None]:
#### Example 2

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
s

We passed the index values here. Now we can see the customized indexed values in the output.

### Create a Series from dict

A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [None]:
#### Example

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
s

Observe − Dictionary keys are used to construct index.

In [None]:
### Example 2

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
s

Observe − Index order is persisted and the missing element is filled with NaN (Not a Number).

### Create a Series from Scalar

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
s

### Accessing Data from Series with Position

Data in the series can be accessed similar to that in an **ndarray**.

#### Example 1
Retrieve the first element. As we already know, the counting starts from zero for the array, which means the first element is stored at zeroth position and so on.

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first element
s[0]

#### Example 2

Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index)

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the first three element
print(s[:3])

#### Example 3

Retrieve the last three elements.

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve the last three element
print(s[-3:])

### Retrieve Data Using Label (Index)

A Series is like a fixed-size dict in that you can get and set values by index label.

#### Example 1

Retrieve a single element using index label value.

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve a single element
print(s['a'])

#### Example 2

Retrieve multiple elements using a list of index label values.

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print(s[['a','c','d']])

#### Example 3

If a label is not contained, an exception is raised.

In [None]:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

#retrieve multiple elements
print(s['f'])

## Python Pandas - DataFrame

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

Features of DataFrame:
    
- Potentially columns are of different types
- Size – Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns

### Structure:
    
You can think of it as an SQL table or a spreadsheet data representation.

### pandas.DataFrame

A pandas DataFrame can be created using the following constructor:

The parameters of the constructor are as follows:

data : data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.

index : For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.

columns : For column labels, the optional default syntax is - np.arrange(n). This is only true if no index is passed.

dtype : Data type of each column.

copy : This command (or whatever it is) is used for copying of data, if the default is False.

## Create DataFrame

A pandas DataFrame can be created using various inputs like:

- Lists
- dict
- Series
- Numpy ndarrays
- Another DataFrame

In the subsequent sections of this chapter, we will see how to create a DataFrame using these inputs.

#### Create an Empty DataFrame

A basic DataFrame, which can be created is an Empty Dataframe.

Example

In [None]:
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)

### Create a DataFrame from Lists

The DataFrame can be created using a single list or a list of lists.

Example 1

In [None]:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

#### Example 2

In [None]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

#### Example 3

In [None]:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)

Note − Observe, the dtype parameter changes the type of Age column to floating point.

### Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is the array length.

#### Example 1

In [None]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)

Note − Observe the values 0,1,2,3. They are the default index assigned to each using the function range(n).

#### Example 2

Let us now create an indexed DataFrame using arrays.

In [None]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print(df)

Note − Observe, the index parameter assigns an index to each row.

## Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are by default taken as column names.

Example 1

The following example shows how to create a DataFrame by passing a list of dictionaries.

In [None]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)

Note − Observe, NaN (Not a Number) is appended in missing areas.

Example 2

The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.

In [None]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)

Example 3

The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

In [None]:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.

### Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

Example:

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df)

Note − Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d label, NaN is appended with NaN.

Let us now understand **column selection**, **addition**, and **deletion** through examples.

### Column Selection

We will understand this by selecting a column from the DataFrame.

Example

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df['one'])

### Column Addition

We will understand this by adding a new column to an existing data frame.

Example

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)

print("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']

print(df)

### Column Deletion

Columns can be deleted or popped; let us take an example to understand how.

Example

In [None]:
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
   'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print("Our dataframe is:")
print(df)

# using del function
print("Deleting the first column using DEL function:")
del df['one']
print(df)

# using pop function
print("Deleting another column using POP function:")
df.pop('two')
print(df)

### Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples. Let us begin with the concept of selection.

#### Selection by Label

Rows can be selected by passing row label to a **loc** function.

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.loc['b'])

The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.

#### Selection by integer location

Rows can be selected by passing integer location to an **iloc** function.

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.iloc[2])

#### Slice Rows

Multiple rows can be selected using ‘ : ’ operator.

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df[2:4])

#### Addition of Rows

Add new rows to a DataFrame using the **append** function. This function will append the rows at the end.

In [None]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print(df)

#### Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows will be dropped.

If you observe, in the above example, the labels are duplicate. Let us drop a label and will see how many rows will get dropped.

In [None]:
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

print(df)

In the above example, two rows were dropped because those two contain the same label 0.

## Python Pandas - Panel

A **panel** is a 3D container of data. The term **Panel data** is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.

The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are:

- **items** − axis 0, each item corresponds to a DataFrame contained inside.

- **major_axis** − axis 1, it is the index (rows) of each of the DataFrames.

- **minor_axis** − axis 2, it is the columns of each of the DataFrames.

### pandas.Panel()

A Panel can be created using the following constructor:

The parameters of the constructor are as follows:

**data** : Data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame

**items** : axis=0

**major_axis** : axis=1

**minor_axis** : axis=2

**dtype** : Data type of each column

**copy** : Copy data. Default, false

### Create Panel

A Panel can be created using multiple ways like:

- From ndarrays
- From dict of DataFrames

### From 3D ndarray

In [None]:
# creating an empty panel
import pandas as pd
import numpy as np

data = np.random.rand(2,4,5)
p = pd.Panel(data)
print(p)

### Python Pandas - Basic Functionality

By now, we learnt about the three Pandas DataStructures and how to create them. We will majorly focus on the **DataFrame** objects because of its importance in the real time data processing.

### Series Basic Functionality

**axes** : Returns a list of the row axis labels
    
**dtype** : Returns the dtype of the object.
    
**empty** : Returns True if series is empty. 

**ndim** : Returns the number of dimensions of the underlying data, by definition 1.

**size** : Returns the number of elements in the underlying data.

**values** : Returns the Series as ndarray.

**head()** : Returns the first n rows.

**tail()** : Returns the last n rows.

In [None]:
import pandas as pd
import numpy as np

#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print("The axes are:")
print(s.axes)

In [None]:
s.dtype

In [None]:
s.empty

In [None]:
s.ndim

In [None]:
s.size

In [None]:
s.values

In [None]:
s.head()

In [None]:
s.tail()

### DataFrame Basic Functionality

Let us now understand what DataFrame Basic Functionality is. The following tables lists down the important attributes or methods that help in DataFrame Basic Functionality.

**T** : Transposes rows and columns.

**axes** : Returns a list with the row axis labels and column axis labels as the only members.

**dtypes** : Returns the dtypes in this object.

**empty** : True if NDFrame is entirely empty [no items]; if any of the axes are of length 0.

**ndim** : Number of axes / array dimensions.

**shape** : Returns a tuple representing the dimensionality of the DataFrame.

**size** : Number of elements in the NDFrame.

**values** : Numpy representation of NDFrame.

**head()** : Returns the first n rows.

**tail()** : Returns last n rows.

Let us now create a DataFrame and see all how the above mentioned attributes operate.

In [None]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print("Our data series is:")
print(df)

In [None]:
df.T

In [None]:
df.axes

In [None]:
df.dtypes

In [None]:
df.ndim

In [None]:
df.shape

In [None]:
df.size

In [None]:
df.values

In [None]:
df.head(2)

In [None]:
df.tail()

### Python Pandas - Descriptive Statistics

A large number of methods collectively compute descriptive statistics and other related operations on DataFrame. 

Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size.

Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by name or integer.

Example

In [None]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df)

**sum( )** : Returns the sum of the values for the requested axis. By default, axis is index (axis=0).

In [None]:
import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.sum())

Each individual column is added individually (Strings are appended).

axis=1 : This syntax will give the output as shown below.

In [None]:

import pandas as pd
import numpy as np
 
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
 
#Create a DataFrame
df = pd.DataFrame(d)
print(df.sum(1))

In [None]:
df.mean()

In [None]:
df.mean(1)

In [None]:
df.std()

### Functions & Description

Let us now understand the functions under Descriptive Statistics in Python Pandas. The following table list down the important functions:

**count()** : Number of non-null observations

**sum()** : Sum of values

**mean()** : Mean of Values

**median()** : Median of Values

**mode()** : Mode of values

**std()** : Standard Deviation of the Values

**min()** : Minimum Value

**max()** : Maximum Value

**abs()** : Absolute Value

**prod()** : Product of Values

**cumsum()** : Cumulative Sum

**cumprod()** : Cumulative Product
    

**Note** − Since DataFrame is a Heterogeneous data structure. Generic operations don’t work with all functions.

Functions like **sum()**, **cumsum()** work with both numeric and character (or) string data elements without any error. Though n practice, character aggregations are never used generally, these functions do not throw any exception.

Functions like **abs()**, **cumprod()** throw exception when the DataFrame contains character or string data because such operations cannot be performed.

### Summarizing Data

The **describe( )** function computes a summary of statistics pertaining to the DataFrame columns.

In [None]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.describe())

This function gives the **mean**, **std** and **IQR** values. And, function excludes the character columns and given summary about numeric columns. **'include'** is the argument which is used to pass necessary information regarding what columns need to be considered for summarizing. Takes the list of values; by default, 'number'.

**object** − Summarizes String columns

**number** − Summarizes Numeric columns

**all** − Summarizes all columns together (Should not pass it as a list value)

Now, use the following statement in the program and check the output:

In [None]:

import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.describe(include=['object']))

Now, use the following statement and check the output:

In [None]:
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print(df. describe(include='all'))

### Python Pandas - IO Tools

The Pandas I/O API is a set of top level reader functions accessed like **pd.read_csv( )** that generally return a Pandas object.

The two workhorse functions for reading text files (or the flat files) are **read_csv( )** and **read_table( )**. They both use the same parsing code to intelligently convert tabular data into a DataFrame object

#### (to be contd ...)