<a href="https://colab.research.google.com/github/owaisahmad315/pandas/blob/main/DataFrames.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
"""
Below is a simple attempt to create a tabular Python data structure that
is column oriented. It has an 0-based integer index, but that is not required,
the index could be string based. Each column is similar to the Series-like
structure developed previously:
"""

df = {
    'index': [0,1,2],
    'cols' : [
        {
            'name' : 'growth',
            'data' : [.5, .7, 1.2]
        },
        {
            'name': 'Name',
            'data' : ['Paul', 'George', 'Ringo']
        }
    ]
}

In [3]:
"""
Rows are accessed via the index, and columns are accessible from the
column name. Below are simple functions for accessing rows and
columns:

"""
def get_row(df, idx):
  results = []
  value_idx = df['index'].index(idx)
  for col in df['cols']:
    results.append(col['data'][value_idx])
  return results
get_row(df, 1)

[0.7, 'George']

In [4]:
def get_col(df, name):
  for col in df['cols']:
    if col['name'] == name:
      return col['data']
get_col(df, 'Name')


['Paul', 'George', 'Ringo']

In [5]:
"""
Using the pandas DataFrame object, the previous data structure could be
created like this:

"""

import pandas as pd

df = pd.DataFrame({
    'growth': [.5, .7, 1.2],
    'Name' : ['Paul', 'George', 'Ringo']
})
df

Unnamed: 0,growth,Name
0,0.5,Paul
1,0.7,George
2,1.2,Ringo


In [6]:
# To access a row by location, index off of the .iloc attribute:
df.iloc[2]

growth      1.2
Name      Ringo
Name: 2, dtype: object

In [7]:
# Columns are accessible via indexing the column name off of the object:
df['Name']

0      Paul
1    George
2     Ringo
Name: Name, dtype: object

In [None]:
# Note the type of column is a pandas Series instance. Any operation that can be done to a series can be applied to a column:
type(df['Name'])

##Construction
Data frames can be created from many types of input:

columns (dicts of lists)

rows (list of dicts)

CSV file (pd.read_csv)

from NumPy ndarray

And more, SQL, HDF5, etc



In [9]:
# Below is an example of creating a data frame from rows:
pd.DataFrame([
    {'growth': .5, 'Name': 'Paul'},
    {'growth': .7, 'Name': 'George'},
    {'growth': 1.2, 'Name' : 'Ringo'},
])

Unnamed: 0,growth,Name
0,0.5,Paul
1,0.7,George
2,1.2,Ringo


In [14]:
"""
A data frame can be instantiated from a NumPy array as well. The
column names will need to be specified:
"""
import numpy as np

pd.DataFrame(np.random.randn(10,3), columns=['a', 'b', 'c'])

Unnamed: 0,a,b,c
0,0.909022,1.176494,0.517842
1,0.05936,0.064478,0.018531
2,-1.062349,-0.098215,-0.190966
3,-1.204134,0.244188,-1.452391
4,0.271832,0.183028,1.03834
5,0.368408,0.357785,0.863463
6,-0.51067,-0.241941,0.143706
7,-0.222193,0.531379,-0.308134
8,0.361678,1.561328,1.627264
9,-0.820552,0.262803,0.244726


In [15]:
# Data Frame Axis
"""
Unlike a series, which has one axis, there are two axes for a data frame.
They are commonly referred to as axis 0 and 1, or the row/index axis and
the columns axis respectively:
"""
df.axes

[RangeIndex(start=0, stop=3, step=1),
 Index(['growth', 'Name'], dtype='object')]

In [16]:
"""
As many operations take an axis parameter, it is important to remember
that 0 is the index and 1 is the columns:
"""
df.axes[0]

RangeIndex(start=0, stop=3, step=1)

In [17]:
df.axes[1]

Index(['growth', 'Name'], dtype='object')