When working with data in Python understanding the structure and content of our dataset is important. The dataframe.info() method in Pandas helps us in providing a concise summary of our DataFrame and it quickly assesses its structure, identify issues like missing values and optimize memory usage.

Key features of dataframe.info() include:

* Number of entries (rows) in the DataFrame.
* Column names and their associated data types like integer, float, object, etc.
* The number of non-null values in each column which is useful for spotting missing data.
* A summary of how much memory the DataFrame is consuming.  
In this article we'll see how to use dataframe.info() to streamline our data exploration process.  

Lets see a examples for better understanding. Here we’ll be using the Pandas library and a random   dataset. We will display a concise summary of the DataFrame using the info() method.

In [1]:
import pandas as pd

In [2]:
df=pd.read_csv("nba.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object 
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB


Here info() provides an overview of the DataFrame's structure such as number of entries, column names, data types and non-null counts.

##### Syntax of dataframe.info()  
DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)  
Parameters: 

1. verbose: Controls the level of detail in the summary.

True: Displays the full summary.  
False: Provides a concise summary.  

2. memory_usage: Shows memory usage of the DataFrame.

True: Displays basic memory usage.  
deep: Provides a detailed view, including memory usage of each column’s objects.  

3. null_counts: Controls whether the number of non-null entries is displayed.

True: Shows non-null counts for each column.  
False: Excludes non-null counts for a cleaner summary.  

### 1. Shortened Summary with verbose=False
Here we will use the verbose parameter to generate a more concise summary of the DataFrame. By setting verbose=False we exclude detailed column information such as the number of non-null values which is useful when working with large datasets where we might not need all the details.

In [3]:
df.info(verbose=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Columns: 9 entries, Name to Salary
dtypes: float64(4), object(5)
memory usage: 32.3+ KB


### 2. Full Summary with Memory Usage
We will use the memory_usage parameter to include detailed memory consumption information in the summary. By setting memory_usage=True, the dataframe.info() method will provide an overview of how much memory the DataFrame uses including both data and index memory usage.

In [4]:
df.info(memory_usage=False)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object 
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)

The describe() method in Pandas generates descriptive statistics of DataFrame columns which provides key metrics like mean, standard deviation, percentiles and more. It works with numeric data by default but can also handle categorical data which offers insights like the most frequent value and the number of unique entries. In this article, we'll see how to use describe() for both numeric and categorical data.

##### Syntax:
DataFrame.describe(percentiles=None, include=None, exclude=None)  

##### Parameters:

* percentiles: A list of numbers between 0 and 1 specifying which percentiles to return. The default is None which returns the 25th, 50th and 75th percentiles.
* include: A list of data types to include in the summary like int, float, object for strings. Default is None means all numeric types are included.
* exclude: A list of data types to exclude from the summary. Default is None means no types are excluded.  
The describe() method returns a statistical summary of the DataFrame or Series which helps to understand the key characteristics of our data quickly. Lets see some examples for its better understanding.  

### 1. Using describe() on a DataFrame
Here we will see how the describe() method generates a statistical summary for numeric columns such as age and salary. This is a basic use case of describe() to give us an overview of key statistical metrics across the dataset.

In [5]:
print("NBA Dataset:")
print(df.head())

print("\nSummary Table Generated by .describe() Method:")
print(df.describe())

NBA Dataset:
            Name            Team  Number Position   Age Height  Weight  \
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   

             College     Salary  
0              Texas  7730337.0  
1          Marquette  6796117.0  
2  Boston University        NaN  
3      Georgia State  1148640.0  
4                NaN  5000000.0  

Summary Table Generated by .describe() Method:
           Number         Age      Weight        Salary
count  457.000000  457.000000  457.000000  4.460000e+02
mean    17.678337   26.938731  221.522976  4.842684e+06
std     15.966090    4.404016   26.368343  5.229238e+06
min      0.000000   19.000000  161.000000  3.088800e+04
25%      5

Descriptive Statistics for Numerical Columns generated using .describe() Method

1. count: Total number of non-null entries in each column.
2. mean: Average (mean) of the values in the column.
3. std: Standard deviation showing how spread out the values are.
4. min: Minimum value in the column.
5. 25%: The 25th percentile (Q1) which means 25% of the data points are less than this value.
6. 50%: Median value (50th percentile) where half the data points are below it.
7. 75%: The 75th percentile (Q3) means 75% of the data points are below this value.
8. max: Maximum value in the column.  
This summary provides us a quick overview of the numeric columns in the dataset which helps us understand the distribution of key variables like age and salary.

### 2. Customizing describe() with Percentiles
We can customize the describe() method by specifying custom percentiles. By passing a list of percentiles we can obtain more detailed insights into our data’s distribution beyond the default 25th, 50th and 75th percentiles.

In [6]:
percentiles = [.20, .40, .60, .80]
include = ['object', 'float', 'int']

desc = df.describe(percentiles=percentiles, include=include)

desc

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
count,457,457,457.0,457,457.0,457,457.0,373,446.0
unique,457,30,,5,,18,,118,
top,Jeff Withey,New Orleans Pelicans,,SG,,6-9,,Kentucky,
freq,1,19,,102,,59,,22,
mean,,,17.678337,,26.938731,,221.522976,,4842684.0
std,,,15.96609,,4.404016,,26.368343,,5229238.0
min,,,0.0,,19.0,,161.0,,30888.0
20%,,,4.0,,23.0,,195.6,,947276.0
40%,,,10.0,,25.0,,213.4,,1938840.0
50%,,,13.0,,26.0,,220.0,,2839073.0


### 3. Describing String (Object) Data
The describe() method also works with string data i.e object data type. When used on string data, it provides different statistics such as the count of unique values, most frequent values etc. This example shows how to apply describe() to a column containing categorical (string) data.

In [7]:
desc = df["Name"].describe()

print(desc)

count             457
unique            457
top       Jeff Withey
freq                1
Name: Name, dtype: object


For string (object) data, describe() provides:

1. count: Total number of non-null values.
2. unique: Number of unique values in the column.
3. top: Most frequent value in the column.
4. freq: Frequency of the most common value.  
This is useful for quickly understanding the distribution of categorical data or identifying the most frequent values.

### 4. Describing Specific Columns with describe()
We may sometimes want to generate a summary for a specific column in our DataFrame. For example we may be interested in analyzing just the "Salary" column without summarizing the other columns.






In [8]:
salary_desc = df["Salary"].describe()

salary_desc

count    4.460000e+02
mean     4.842684e+06
std      5.229238e+06
min      3.088800e+04
25%      1.044792e+06
50%      2.839073e+06
75%      6.500000e+06
max      2.500000e+07
Name: Salary, dtype: float64

### 5. Describing Data with include='all'
By using the include='all' parameter we can generate a summary for all columns in the DataFrame regardless of data type. This is helpful when we want to analyze both numeric and categorical data at the same time.

In [9]:
desc_all = df.describe(include='all')

desc_all

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
count,457,457,457.0,457,457.0,457,457.0,373,446.0
unique,457,30,,5,,18,,118,
top,Jeff Withey,New Orleans Pelicans,,SG,,6-9,,Kentucky,
freq,1,19,,102,,59,,22,
mean,,,17.678337,,26.938731,,221.522976,,4842684.0
std,,,15.96609,,4.404016,,26.368343,,5229238.0
min,,,0.0,,19.0,,161.0,,30888.0
25%,,,5.0,,24.0,,200.0,,1044792.0
50%,,,13.0,,26.0,,220.0,,2839073.0
75%,,,25.0,,30.0,,240.0,,6500000.0


Counting values in Pandas dataframe is important for understanding the distribution of data, checking for missing values or summarizing data. In this article, we will learn various methods to count values in a Pandas DataFrame.

We will be using a sample DataFrame to learn about various methods:

In [11]:
data = {
    'Name': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'Alice'],
    'Age': [25, 30, 25, 35, 30, 25],
    'City': ['New York', 'Chicago', 'New York', 'San Francisco', 'Chicago', 'New York']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Chicago
2,Alice,25,New York
3,Charlie,35,San Francisco
4,Bob,30,Chicago
5,Alice,25,New York


### 1. Counting Unique Values in a Column
To count the unique values in a specific column of a DataFrame we can use the nunique() method. This method returns the number of unique values in the column.

In [12]:
unique_names = df['Name'].nunique()
print(unique_names)

3


### 2. Counting Non-Null Values
Pandas provides the count() method to count non-null values in a DataFrame or a specific column. This method excludes NaN values.

In [14]:
non_null_ages = df['Age'].count()
print(non_null_ages)

6


### 3. Counting Missing (Null) Values
To count the number of missing or null values in a DataFrame we can use the isnull() function along with sum(). This combination will return the count of missing values in each column.

In [15]:
# Count missing values for 'Age' column in the DataFrame
missing_age_values = df['Age'].isnull().sum()
print(missing_age_values)

0


### 4. Using value_counts() to Count Occurrences
The value_counts() method is frequently used functions for counting values in a Pandas DataFrame. It returns the frequency of unique values in a column ordered by the frequency of occurrences.

In [18]:
df['Name'].value_counts()

Name
Alice      3
Bob        2
Charlie    1
Name: count, dtype: int64

### 5. Handling NaN Values During Counting
By default value_counts() excludes NaN values. If we want to include NaN in the count pass the dropna=False argument:

In [20]:
import numpy as np
# Adding a NaN value in the 'City' column at index 2
df.loc[2, 'City'] = np.nan

# Include NaN values in the count
nan_included_counts = df['City'].value_counts(dropna=False)
print(nan_included_counts)

City
New York         2
Chicago          2
NaN              1
San Francisco    1
Name: count, dtype: int64


### 6. Count Values by Grouping Data
We can also count values in different groups using the groupby() method. This is useful when we want to count occurrences of values within each category of another column.

In [21]:
# Count occurrences of values in 'Name' column grouped by 'Age'
grouped_counts = df.groupby('Age')['Name'].value_counts()
print(grouped_counts)

Age  Name   
25   Alice      3
30   Bob        2
35   Charlie    1
Name: count, dtype: int64


The head() method structure and contents of our dataset without printing everything. By default it returns the first five rows but this can be customized to return any number of rows. It is commonly used to verify that data has been loaded correctly, check column names and inspect the initial records.

Lets see an example of using head() on a DataFrame

We will see how to use the head() method to retrieve the first few rows of the DataFrame(). This provides a quick preview of the dataset’s structure and contents.

In [23]:
data = pd.read_csv("nba.csv")

print("NBA Dataset (First 5 Rows):")
data.head()

NBA Dataset (First 5 Rows):


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


##### Syntax:  

DataFrame.head(n=5)  

Series.head(n=5)  

Parameter:  
 
n: Number of rows to retrieve from the top of the DataFrame or Series.
Return: It returns the first n rows of the DataFrame or Series as a new DataFrame or Series.

### Example of head() method
Lets see other examples for its better understanding.

####  1. Using head() with a Custom Number of Rows
While the default number of rows returned by head() is 5 but we can customize this number to preview a specific subset of the data. This is useful when we want to see more or fewer rows.


In [24]:
series = data["Name"]

top = series.head(n = 7)

print("NBA Dataset (First 7 Rows):")
print(top)

NBA Dataset (First 7 Rows):
0    Avery Bradley
1      Jae Crowder
2     John Holland
3      R.J. Hunter
4    Jonas Jerebko
5     Amir Johnson
6    Jordan Mickey
Name: Name, dtype: object


#### 2. Using head() on a Series
The head() method can also be used on a Pandas Series to retrieve the first few elements. This is useful when we're working with a single column of data and want to quickly inspect its contents.

In [25]:
salary = data['Salary']

print("First 5 Salaries:")
print(salary.head())

First 5 Salaries:
0    7730337.0
1    6796117.0
2          NaN
3    1148640.0
4    5000000.0
Name: Salary, dtype: float64


#### 3. Describing Specific Columns with head()
We can also use the head() method to preview specific columns of our dataset. This example focuses on previewing just the "Name" and "Salary" columns.

In [27]:
salary_name = data[['Name', 'Salary']].head()

print("First 5 Rows of Name and Salary Columns:")
salary_name

First 5 Rows of Name and Salary Columns:


Unnamed: 0,Name,Salary
0,Avery Bradley,7730337.0
1,Jae Crowder,6796117.0
2,John Holland,
3,R.J. Hunter,1148640.0
4,Jonas Jerebko,5000000.0


#### 4. Using head() After Sorting a DataFrame
Here we will use head() to inspect the first few rows of a DataFrame after sorting it by a specific column. This is useful when we want to identify the top records based on a specific criterion like the highest salary or the youngest player.

In [31]:
sorted_data = data.sort_values(by='Age', ascending=True)

top_sorted = sorted_data.head()

print("First 5 Rows After Sorting by Age:")
top_sorted

First 5 Rows After Sorting by Age:


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
226,Rashad Vaughn,Milwaukee Bucks,20.0,SG,19.0,6-6,202.0,UNLV,1733040.0
122,Devin Booker,Phoenix Suns,1.0,SG,19.0,6-6,206.0,Kentucky,2127840.0
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
427,Cliff Alexander,Portland Trail Blazers,34.0,PF,20.0,6-8,240.0,Kansas,525093.0
410,Karl-Anthony Towns,Minnesota Timberwolves,32.0,C,20.0,7-0,244.0,Kentucky,5703600.0


The tail() method allows us to quickly preview the last few rows of a DataFrame or Series. This method is useful for data exploration as it helps us to inspect the bottom of the dataset without printing everything. It is commonly used to verify that data has been loaded correctly, check the last records and inspect the data towards the end of a dataset.  

Let's see an example: Using tail() on a DataFrame  

We will use the tail() method to retrieve the last few rows of a DataFrame. This provides a quick preview of the dataset’s structure and contents. By default, the method returns and stores the last 5 rows of the DataFrame in a new variable but we can customized this to return any number of rows.

In [32]:
print("NBA Dataset (Last 5 Rows):")
print(data.tail())

NBA Dataset (Last 5 Rows):
             Name       Team  Number Position   Age Height  Weight College  \
453  Shelvin Mack  Utah Jazz     8.0       PG  26.0    6-3   203.0  Butler   
454     Raul Neto  Utah Jazz    25.0       PG  24.0    6-1   179.0     NaN   
455  Tibor Pleiss  Utah Jazz    21.0        C  26.0    7-3   256.0     NaN   
456   Jeff Withey  Utah Jazz    24.0        C  26.0    7-0   231.0  Kansas   
457           NaN        NaN     NaN      NaN   NaN    NaN     NaN     NaN   

        Salary  
453  2433333.0  
454   900000.0  
455  2900000.0  
456   947276.0  
457        NaN  


he tail() method returns the last 5 rows of the dataset which provides a quick preview of the columns and their respective values at the bottom of the dataset.

Syntax:  

DataFrame.tail(n=5)  

Series.tail(n=5)

Parameter:  

n: Number of rows to retrieve from the bottom of the DataFrame or Series.
Return: It returns the last n rows of the DataFrame or Series as a new DataFrame or Series.

### Example of tail() method
Let’s see other examples for a better understanding.

#### 1. Using tail() with a Custom Number of Rows
While the default number of rows returned by tail() is 5 but we can customize this number to preview a specific subset of the data. This is useful when we want to see more or fewer rows.

In [33]:
series = data["Name"]

bottom = series.tail(n = 7)

print("NBA Dataset (Last 7 Rows):")
print(bottom)

NBA Dataset (Last 7 Rows):
451    Chris Johnson
452       Trey Lyles
453     Shelvin Mack
454        Raul Neto
455     Tibor Pleiss
456      Jeff Withey
457              NaN
Name: Name, dtype: object


#### 2. Using tail() on a Series
The tail() method can also be used on a Pandas Series to retrieve the last few elements. This is useful when we're working with a single column of data and want to quickly inspect its contents.

In [34]:
salary = data['Salary']

print("Last 5 Salaries:")
print(salary.tail())

Last 5 Salaries:
453    2433333.0
454     900000.0
455    2900000.0
456     947276.0
457          NaN
Name: Salary, dtype: float64


#### 3. Describing Specific Columns with tail()
We can also use the tail() method to preview specific columns of our dataset. This example focuses on previewing just the "Name" and "Salary" columns.

In [35]:
salary_name = data[['Name', 'Salary']].tail()

print("Last 5 Rows of Name and Salary Columns:")
print(salary_name)

Last 5 Rows of Name and Salary Columns:
             Name     Salary
453  Shelvin Mack  2433333.0
454     Raul Neto   900000.0
455  Tibor Pleiss  2900000.0
456   Jeff Withey   947276.0
457           NaN        NaN


#### 4. Using tail() After Sorting a DataFrame
In this example, we will sort the DataFrame by the "Age" column and then use tail() to preview the oldest players. This shows how tail() can be used in combination with sorting to inspect specific records.

In [37]:
sorted_data = data.sort_values(by='Age', ascending=False)

bottom_sorted = sorted_data.tail()

print("Last 5 Rows After Sorting by Age:")
bottom_sorted

Last 5 Rows After Sorting by Age:


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
410,Karl-Anthony Towns,Minnesota Timberwolves,32.0,C,20.0,7-0,244.0,Kentucky,5703600.0
226,Rashad Vaughn,Milwaukee Bucks,20.0,SG,19.0,6-6,202.0,UNLV,1733040.0
122,Devin Booker,Phoenix Suns,1.0,SG,19.0,6-6,206.0,Kentucky,2127840.0
457,,,,,,,,,
