## Quick Dataframe Summaries in Pandas
A collection of methods and attributes that allow you to quickly look at dataframes and their properties

##### Avoiding Truncation of outputs from the following functions in larger datasets

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

##### Quick Dataframe Summary Methods

In [None]:
print(df.shape)

"""
.shape attribute: 
This gives you a tuple representing the dimensions of the DataFrame, showing the number of rows and columns.
"""

In [None]:
df.info()

"""
.info() method:
This provides a concise summary of the DataFrame, including the number of non-null entries in each column, 
the datatype of each column, and memory usage. It's very useful for getting a quick overview of the data, 
especially for identifying missing values and data types.
"""

In [None]:
df.describe()

"""
.describe() method: 
This gives you a statistical summary of the numerical columns in the DataFrame, 
including count, mean, standard deviation, min, max, and quartiles. It's handy for 
getting a sense of the distribution of your data.
"""

In [None]:
df.head()  # First 5 rows
df.tail()  # Last 5 rows

"""
.head(n) and .tail(n) methods: 
These methods show the first n and last n rows of the DataFrame respectively, allowing you to 
quickly inspect the actual data. If n is not specified, by default, it returns the first/last 5 rows.
"""

In [None]:
print(df.dtypes)

"""
.dtypes attribute: 
This gives you the data type of each column in the DataFrame.
"""

In [None]:
df.nunique()

"""
.nunique() method: 
This returns the number of unique values in each column. 
It's useful for understanding the diversity of data in each feature.
"""

In [None]:
df.isnull().sum()

"""
.isnull().sum(): 
This chain of methods helps identify how many missing (NaN) values are in each column, 
which is crucial for cleaning and preprocessing data.
"""

In [None]:
df.duplicated().sum()

"""
Identifies if there are duplicated values within a dataset.
"""

In [None]:
df.sample()

"""
Useful for randomly sampling rows from the DataFrame. 
This can be particularly helpful when you have a large dataset and want a quick snapshot that's representative of the whole.
"""