<a href="https://colab.research.google.com/github/owaisahmad315/pandas/blob/main/Data_Frame_Methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [76]:
# The data for this section is sample retail sales data:
import pandas as pd
from io import StringIO

data = StringIO(
    '''UPC,Units,Sales,Date
    1234,5,20.2,1-1-2014
    1234,2,8.,1-2-2014
    1234,3,13.,1-3-2014
    789,1,2.,1-1-2014
    789,2,3.8,1-2-2014
    789,,,1-3-2014
    789,1,1.8,1-5-2014'''
)

sales = pd.read_csv(data)
sales

Unnamed: 0,UPC,Units,Sales,Date
0,1234,5.0,20.2,1-1-2014
1,1234,2.0,8.0,1-2-2014
2,1234,3.0,13.0,1-3-2014
3,789,1.0,2.0,1-1-2014
4,789,2.0,3.8,1-2-2014
5,789,,,1-3-2014
6,789,1.0,1.8,1-5-2014


In [77]:
# Data Frame Attributes
"""
Let's dig in a little more. We can examine the axes of a data frame by
looking at the .axes attribute:

"""
sales.axes

[RangeIndex(start=0, stop=7, step=1),
 Index(['UPC', 'Units', 'Sales', 'Date'], dtype='object')]

In [78]:
# The .axes is a list that contains the index and columns:
sales.index

RangeIndex(start=0, stop=7, step=1)

In [79]:
sales.columns

Index(['UPC', 'Units', 'Sales', 'Date'], dtype='object')

In [80]:
# The number of row and columns is also available via the .shape attribute:
sales.shape

(7, 4)

In [81]:
sales.info

In [82]:
# Iteration
"""
Data frames include a variety of methods to iterate over the values. By
default, iteration occurs over the column names:

"""
for column in sales:
  print(column)

UPC
Units
Sales
Date


In [83]:
'''
The .iteritems method returns pairs of column names and the
individual column (as a Series):

'''
for col, ser in sales.iteritems():
  print(col, ser)

UPC 0    1234
1    1234
2    1234
3     789
4     789
5     789
6     789
Name: UPC, dtype: int64
Units 0    5.0
1    2.0
2    3.0
3    1.0
4    2.0
5    NaN
6    1.0
Name: Units, dtype: float64
Sales 0    20.2
1     8.0
2    13.0
3     2.0
4     3.8
5     NaN
6     1.8
Name: Sales, dtype: float64
Date 0    1-1-2014
1    1-2-2014
2    1-3-2014
3    1-1-2014
4    1-2-2014
5    1-3-2014
6    1-5-2014
Name: Date, dtype: object


  for col, ser in sales.iteritems():


In [84]:
'''
The .iterrows method returns a tuple for every row. The tuple has two
items. The first is the index value. The second is the row converted into a
Series object. This might be a little tricky in practice because a row's
values might not be homogenous, whereas that is usually the case in a
column of data. Notice that the dtype for the row series is object because
the row has strings and numeric values in it:


'''
for row in sales.iterrows():
  print(row)
  break # limit data

(0, UPC          1234
Units         5.0
Sales        20.2
Date     1-1-2014
Name: 0, dtype: object)


In [85]:
'''
The .itertuples method returns a namedtuple containing the index and
row values:

'''
for row in sales.itertuples():
  print(row)

Pandas(Index=0, UPC=1234, Units=5.0, Sales=20.2, Date='1-1-2014')
Pandas(Index=1, UPC=1234, Units=2.0, Sales=8.0, Date='1-2-2014')
Pandas(Index=2, UPC=1234, Units=3.0, Sales=13.0, Date='1-3-2014')
Pandas(Index=3, UPC=789, Units=1.0, Sales=2.0, Date='1-1-2014')
Pandas(Index=4, UPC=789, Units=2.0, Sales=3.8, Date='1-2-2014')
Pandas(Index=5, UPC=789, Units=nan, Sales=nan, Date='1-3-2014')
Pandas(Index=6, UPC=789, Units=1.0, Sales=1.8, Date='1-5-2014')


In [86]:
'''
If you aren't familiar with NamedTuples in Python, check them out
from the collections module. They give you all the benefits of a
tuple: immutable, low memory requirements, and index access. In
addition, the namedtuple allows you to access values by attribute:


'''
import collections

Sales = collections.namedtuple('Sales',
                               'upc,units,sales')
s = Sales(1234, 5., 20.2)
s[0] # index access
s.upc # attribute access


1234

## Matrix Operations


In [87]:
'''
 The data frame can be treated as a matrix. There is support for transposing
a matrix:


'''

sales.transpose()

Unnamed: 0,0,1,2,3,4,5,6
UPC,1234,1234,1234,789,789,789,789
Units,5.0,2.0,3.0,1.0,2.0,,1.0
Sales,20.2,8.0,13.0,2.0,3.8,,1.8
Date,1-1-2014,1-2-2014,1-3-2014,1-1-2014,1-2-2014,1-3-2014,1-5-2014


In [None]:
'''
The .T property of a data frame is a nice wrapper to the .transpose
method. It comes in handy when examining a data frame in an
iPython Notebook. It turns out that viewing the column headers along
the left-hand side often makes the data more compact and easier to
read.

'''
# The dot product can be called on a data frame if the contents are numeric:
sales.dot(sales.T)

##Serialization

In [91]:
'''
Data frames can serialize to many forms. The most important functionality
is probably converting to and from a CSV file, as this format is the lingua
franca of data. We already saw that the pd.read_csv function will create a
DataFrame. Writing to CSV is easy, we simply use the .to_csv method:

'''
fout = StringIO()
sales.to_csv(fout, index_label = 'index')
print(fout.getvalue())

index,UPC,Units,Sales,Date
0,1234,5.0,20.2,1-1-2014
1,1234,2.0,8.0,1-2-2014
2,1234,3.0,13.0,1-3-2014
3,789,1.0,2.0,1-1-2014
4,789,2.0,3.8,1-2-2014
5,789,,,1-3-2014
6,789,1.0,1.8,1-5-2014



In [None]:
sales.to_dict

In [93]:
"""
An optional parameter orient can create a mapping of column name to
a list of values:


"""
sales.to_dict(orient='list')

{'UPC': [1234, 1234, 1234, 789, 789, 789, 789],
 'Units': [5.0, 2.0, 3.0, 1.0, 2.0, nan, 1.0],
 'Sales': [20.2, 8.0, 13.0, 2.0, 3.8, nan, 1.8],
 'Date': ['1-1-2014',
  '1-2-2014',
  '1-3-2014',
  '1-1-2014',
  '1-2-2014',
  '1-3-2014',
  '1-5-2014']}

In [94]:
# Data frames can also be created from the serialized dict if needed:
pd.DataFrame.from_dict(sales.to_dict())

Unnamed: 0,UPC,Units,Sales,Date
0,1234,5.0,20.2,1-1-2014
1,1234,2.0,8.0,1-2-2014
2,1234,3.0,13.0,1-3-2014
3,789,1.0,2.0,1-1-2014
4,789,2.0,3.8,1-2-2014
5,789,,,1-3-2014
6,789,1.0,1.8,1-5-2014


In [89]:
# In addition, data frames can read and write Excel files. Use the
# .to_excel method to dump the data out:
writer=  pd.ExcelWriter('/tmp/ouptput.xlsx')
sales.to_excel(writer, 'sheet1')
writer.save()

  writer.save()


In [95]:
# We can also read Excel data:
pd.read_excel('/content/Historicalinvesttemp.xlsx')

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3
0,,,,
1,,,,
2,,,,
3,,,,
4,,Annual Returns on Investments in,,
...,...,...,...,...
85,2007,0.0549,0.0988,0.0466
86,2008,-0.37,0.2587,0.016
87,2009,0.2646,-0.149,0.001
88,,stocks,tbills,bonds
