#Basic data structures in pandas

In [3]:
import pandas as pd
import numpy as np

Pandas provides two types of classes for handling data:

#Series: 
    a one-dimensional labeled array holding data of any type such as integers, strings, Python objects etc.

#DataFrame:
    a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

#Object creation


Creating a Series by passing a list of values, letting pandas create a default RangeIndex.

In [4]:
s = pd.Series([1,3,5,np.nan,6,8])

In [5]:
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a NumPy array with a datetime index using date_range() and labeled columns:

In [6]:
dates = pd.date_range("20240824",periods=8)

In [7]:
dates

DatetimeIndex(['2024-08-24', '2024-08-25', '2024-08-26', '2024-08-27',
               '2024-08-28', '2024-08-29', '2024-08-30', '2024-08-31'],
              dtype='datetime64[ns]', freq='D')

In [9]:
    df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list("ABCD"))

ValueError: Shape of passed values is (6, 4), indices imply (8, 4)

In [10]:
    df = pd.DataFrame(np.random.randn(8,4), index=dates, columns=list("ABCD"))

In [11]:
df

Unnamed: 0,A,B,C,D
2024-08-24,0.307516,-0.082928,-0.927577,1.580524
2024-08-25,-0.709782,-0.44683,0.898878,1.069193
2024-08-26,-0.118648,1.307614,0.206708,-0.341986
2024-08-27,-0.37275,-0.024332,1.24855,0.331454
2024-08-28,0.828507,1.381147,-1.780132,0.220608
2024-08-29,-0.063506,-0.114044,-1.55787,0.252668
2024-08-30,-1.6792,-1.059792,0.334396,0.207468
2024-08-31,1.225754,0.529177,-1.502982,1.227748


Creating a DataFrame by passing a dictionary of objects where the keys are the column labels and the values are the column values.

In [12]:
df2 = pd.DataFrame({
    "A":1.0,
})

ValueError: If using all scalar values, you must pass an index

In [13]:
df2 = pd.DataFrame({
    "A":1.0,
    "B":pd.TimeStamp("20240822"),
})

AttributeError: module 'pandas' has no attribute 'TimeStamp'

In [14]:
df2 = pd.DataFrame({
    "A":1.0,
    "B":pd.Timestamp("20240822"),
    "C":pd.Series(1,index=list(range(4),dtype="float32"),
    "D":np.array([3]*4, dtype="int32"),
    "F":"Foo"
})

SyntaxError: closing parenthesis '}' does not match opening parenthesis '(' on line 4 (3630114223.py, line 7)

In [15]:
df2 = pd.DataFrame({
    "A":1.0,
    "B":pd.Timestamp("20240822"),
    "C":pd.Series(1,index=list(range(4)),dtype="float32"),
    "D":np.array([3]*4, dtype="int32"),
    "F":"Foo"
})

In [16]:
df2

Unnamed: 0,A,B,C,D,F
0,1.0,2024-08-22,1.0,3,Foo
1,1.0,2024-08-22,1.0,3,Foo
2,1.0,2024-08-22,1.0,3,Foo
3,1.0,2024-08-22,1.0,3,Foo


In [17]:
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
F           object
dtype: object

#Viewing data

#Use DataFrame.head()  to view the top rows of frame

In [21]:
df.head

<bound method NDFrame.head of                    A         B         C         D
2024-08-24  0.307516 -0.082928 -0.927577  1.580524
2024-08-25 -0.709782 -0.446830  0.898878  1.069193
2024-08-26 -0.118648  1.307614  0.206708 -0.341986
2024-08-27 -0.372750 -0.024332  1.248550  0.331454
2024-08-28  0.828507  1.381147 -1.780132  0.220608
2024-08-29 -0.063506 -0.114044 -1.557870  0.252668
2024-08-30 -1.679200 -1.059792  0.334396  0.207468
2024-08-31  1.225754  0.529177 -1.502982  1.227748>

In [25]:
df.head(3)

Unnamed: 0,A,B,C,D
2024-08-24,0.307516,-0.082928,-0.927577,1.580524
2024-08-25,-0.709782,-0.44683,0.898878,1.069193
2024-08-26,-0.118648,1.307614,0.206708,-0.341986


#use  DataFrame.tail() to view the botoom rows of frame

In [23]:
df.tail

<bound method NDFrame.tail of                    A         B         C         D
2024-08-24  0.307516 -0.082928 -0.927577  1.580524
2024-08-25 -0.709782 -0.446830  0.898878  1.069193
2024-08-26 -0.118648  1.307614  0.206708 -0.341986
2024-08-27 -0.372750 -0.024332  1.248550  0.331454
2024-08-28  0.828507  1.381147 -1.780132  0.220608
2024-08-29 -0.063506 -0.114044 -1.557870  0.252668
2024-08-30 -1.679200 -1.059792  0.334396  0.207468
2024-08-31  1.225754  0.529177 -1.502982  1.227748>

In [24]:
df.tail(4)

Unnamed: 0,A,B,C,D
2024-08-28,0.828507,1.381147,-1.780132,0.220608
2024-08-29,-0.063506,-0.114044,-1.55787,0.252668
2024-08-30,-1.6792,-1.059792,0.334396,0.207468
2024-08-31,1.225754,0.529177,-1.502982,1.227748


# To Display the DataFrame.index or DataFrame.columns