In [1]:
import pandas as pd
import numpy as np

set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.

In [14]:
df = pd.DataFrame({
                    'Name': ['Alice', 'Bob', 'Aritra'],
                    'Age': [25, 30, 35],
                    'Location': ['Seattle', 'New York', 'Kona']
                  },
                  index=([10, 20, 30]))

In [15]:
df

Unnamed: 0,Name,Age,Location
10,Alice,25,Seattle
20,Bob,30,New York
30,Aritra,35,Kona


<h3 align='left' style='color:blue'> .index</h3>

In [3]:
df.index

Index([10, 20, 30], dtype='int64')

modifying the index labels of the DataFrame by assigning a new list of labels to the index attribute.

In [4]:
df.index = [100, 200, 300]

In [5]:
df.index

Index([100, 200, 300], dtype='int64')

<h3 align='left' style='color:blue'>.columns</h3>

The column labels of the DataFrame.

In [6]:
df.columns

Index(['Name', 'Age', 'Location'], dtype='object')

<h3 align='left' style='color:blue'> .dtypes</h3>

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. 

In [7]:
df.dtypes

Name        object
Age          int64
Location    object
dtype: object

<h3 align='left' style='color:blue'> info()</h3>

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

In [12]:
df.info(memory_usage=False)

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 100 to 300
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Name      3 non-null      object
 1   Age       3 non-null      int64 
 2   Location  3 non-null      object
dtypes: int64(1), object(2)

In [13]:
df.info(show_counts=False, memory_usage=False)

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 100 to 300
Data columns (total 3 columns):
 #   Column    Dtype 
---  ------    ----- 
 0   Name      object
 1   Age       int64 
 2   Location  object
dtypes: int64(1), object(2)

<h3 align='left' style='color:blue'> select_dtypes()</h3>

Return a subset of the DataFrame’s columns based on the column dtypes.

In [28]:
df.select_dtypes(include='object')

Unnamed: 0,Name,Location
10,Alice,Seattle
20,Bob,New York
30,Aritra,Kona


In [32]:
df.select_dtypes(include='number')

Unnamed: 0,Age
10,25
20,30
30,35


In [35]:
df.select_dtypes(include='int64')

Unnamed: 0,Age
10,25
20,30
30,35


Return a Numpy representation of the DataFrame.
Only the values in the DataFrame will be returned, the axes labels will be removed.



In [39]:
df.values

array([['Alice', 25, 'Seattle'],
       ['Bob', 30, 'New York'],
       ['Aritra', 35, 'Kona']], dtype=object)

<h3 align='left' style='color:blue'> to_numpy()</h3>

In [40]:
pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()

array([[1, 3],
       [2, 4]])

In [51]:
d = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})

In [60]:
d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
d = pd.DataFrame(data=d, index=[0, 1, 2, 3])

In [61]:
arry = d.to_numpy(copy=True)

In [62]:
arry

array([[ 0., nan],
       [ 1., nan],
       [ 2.,  2.],
       [ 3.,  3.]])

<h3 align='left' style='color:blue'> .axes</h3>

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

In [63]:
df.axes

[Index([10, 20, 30], dtype='int64'),
 Index(['Name', 'Age', 'Location'], dtype='object')]

<h3 align='left' style='color:blue'>.ndim</h3>

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

In [67]:
df.ndim

2

<h3 align='left' style='color:blue'>.size</h3>

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

In [70]:
df.size

9

<h3 align='left' style='color:blue'>.shape</h3>

Return a tuple representing the dimensionality of the DataFrame.



In [72]:
df.shape

(3, 3)

<h3 align='left' style='color:blue'>.astype()</h3>

Cast a pandas object to a specified dtype dtype.

In [81]:
f = {'col1': [1, 2], 'col2': [3, 4]}
f = pd.DataFrame(data=f)

In [84]:
f

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [79]:
f.dtypes

col1    int64
col2    int64
dtype: object

In [80]:
# converting the whole dataframe to int32
f.astype('int32').dtypes

col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

Converting multiple columns to different data types

In [83]:
f.astype({'col1':'int32','col2':'float16'}).dtypes

col1      int32
col2    float16
dtype: object

<h3 align='left' style='color:blue'>at[]</h3>

Access a single value for a row/column label pair.

Get value at specified row/column pair

In [89]:
f.at[0, 'col2']

np.int64(3)

Set value at specified row/column pair



In [97]:
f.at[0, 'col2'] = 10
f.at[0, 'col2']

np.int64(10)

<h3 align='left' style='color:blue'>iat[]</h3>

Access a single value for a row/column pair by integer position.

Get value at specified row/column pair

In [98]:
f.iat[1,1]

np.int64(4)

Set value at specified row/column pair

In [99]:
f.iat[1,1] = 90
f.iat[1,1]

np.int64(90)

<h3 align='left' style='color:blue'>loc[]</h3>

Access a group of rows and columns by label(s) or a boolean array.

In [101]:
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
                  index=['cobra', 'viper', 'sidewinder'],
                  columns=['max_speed', 'shield'])

In [103]:
df

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,5
sidewinder,7,8


Single label. Note this returns the row as a Series.

In [104]:
df.loc['viper']

max_speed    4
shield       5
Name: viper, dtype: int64

## List of labels. Note using [[]] returns a DataFrame.

In [102]:
df.loc[['viper', 'sidewinder']]

Unnamed: 0,max_speed,shield
viper,4,5
sidewinder,7,8


Single label for row and column

In [110]:
df.loc['sidewinder', 'shield']

np.int64(8)

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.



In [108]:
df.loc['cobra':'viper', 'max_speed':'shield']

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,5


Boolean list with the same length as the row axis

In [120]:
df.loc[[True, False, True],['max_speed']]

Unnamed: 0,max_speed
cobra,1
sidewinder,7


In [122]:
df.loc[[True, False, True],'max_speed']

cobra         1
sidewinder    7
Name: max_speed, dtype: int64

Conditional that returns a boolean Series



In [115]:
df.loc[df['shield'] > 6]

Unnamed: 0,max_speed,shield
sidewinder,7,8


Conditional that returns a boolean Series with column labels specified



In [121]:
df.loc[df['shield'] > 6, 'max_speed']

sidewinder    7
Name: max_speed, dtype: int64

In [119]:
df.loc[df['shield'] > 6, ['max_speed']]

Unnamed: 0,max_speed
sidewinder,7


In [118]:
df.loc[df['shield'] > 6, ['max_speed','shield']]

Unnamed: 0,max_speed,shield
sidewinder,7,8


Multiple conditional using & that returns a boolean Series



In [123]:
df.loc[(df['max_speed'] > 1) & (df['shield'] < 8)]

Unnamed: 0,max_speed,shield
viper,4,5


In [124]:
df.loc[(df['max_speed'] > 1) & (df['shield'] < 8),['shield']]

Unnamed: 0,shield
viper,5


Multiple conditional using | that returns a boolean Series



In [125]:
df.loc[(df['max_speed'] > 4) | (df['shield'] < 5)]

Unnamed: 0,max_speed,shield
cobra,1,2
sidewinder,7,8


In [126]:
df.loc[(df['max_speed'] > 4) | (df['shield'] < 5),['shield']]

Unnamed: 0,shield
cobra,2
sidewinder,8


## Setting values

Set value for all items matching the list of label

In [127]:
df.loc[['viper', 'sidewinder'], ['shield']] = 50

In [130]:
df

Unnamed: 0,max_speed,shield
cobra,10,10
viper,4,50
sidewinder,7,50


Set value for an entire row

In [128]:
df.loc['cobra'] = 10

In [129]:
df

Unnamed: 0,max_speed,shield
cobra,10,10
viper,4,50
sidewinder,7,50


Set value for an entire column

In [131]:
df.loc[:, 'max_speed'] = 30

In [132]:
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,30,50
sidewinder,30,50


Set value for rows matching callable condition

In [133]:
df.loc[df['shield'] > 35] = 0

In [134]:
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,0,0
sidewinder,0,0


Add value matching location

In [135]:
df.loc["viper", "shield"] += 5

In [136]:
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,0,5
sidewinder,0,0
