## Summary

Assessing your data is the second step in the data-wrangling process. When data wranglers are assessing data, they're inspecting their datasets for two things:

> **Data quality** issues like missing, duplicates, or incorrect data. This is called **dirty data**. Inaccurate data is an example of dirty data, i.e., a data quality issue. High data quality is crucial for obtaining reliable insights and driving informed decision-making.


> **Data structural** issues like data have different formats. This is called **messy data**. Unorganized data is an example of messy data, i.e., a data structural issue. With tidy data, we can focus on performing additional assessments and cleaning without dealing with roadblocks around getting the data into a structure that's easier to view and parse.


You can search for these issues in two ways:
> Visually by scrolling
    
> Programmatically using code

Remember to document the data issue when you detect it to make cleaning easier.

## dimensions of Data Quality
Let's recap the key dimensions of data quality:

- **Completeness** is a metric that helps you understand whether your data is sufficient to answer interesting questions or solve your problem.
- **Validity** is a metric helping you understand how well your data conforms to a defined set of rules for data, also known as a schema.
- **Accuracy** is a metric that helps you understand whether your data accurately represents the reality it aims to depict.
- **Consistency** is a metric that helps you understand two things: whether your data follows a standard format and whether your data’s info matches with information from other data sources.
- **Uniqueness** is a metric that helps you understand whether there are duplicate or overlapping values in your data.

## Assessment
Visual + Programmatic assessment - inspect records

    .head()
    .tail()
    .sample()

Summary of dataframe

    .info()
    .describe()

Retrieve specific information

    .isnull()
    .sort_values()
    .value_counts()
    .duplicated()
    .min()
