Below serve as example on how this script should look like and not represent the full script

## Exploratory Data Analysis

- **Objective**: 
    - check data quality
    - check row and columns count/shape
    - understand the data

### Data dictionary

| Column Name                | Description                                                              | Use                                                                                  |
|----------------------------|--------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| trans_date_trans_time      | The date and time when the transaction occurred.                        | Used for time-series analysis and understanding transaction patterns over time.     |
| cc_num                     | The credit card number of the customer (masked or anonymized for privacy). | Identifies which customer made the transaction, useful for tracking customer behavior. |
| merchant                   | The name of the merchant where the transaction took place.              | Helps analyze spending habits by merchant and identify potentially fraudulent merchants. |
| category                   | The category of the merchant (e.g., miscellaneous, grocery, etc.).     | Used for categorizing transactions and analyzing spending by category.               |
| amt                        | The amount of money involved in the transaction.                        | Important for calculating total spending, fraud detection, and transaction analysis. |
| first                      | The first name of the credit card holder.                               | Useful for personalization and customer segmentation.                               |
| last                       | The last name of the credit card holder.                                | Similar to the first name, helps in personalization and customer identification.    |
| gender                     | The gender of the credit card holder.                                   | Used for demographic analysis and understanding spending patterns by gender.        |
| street                     | The street address of the credit card holder.                           | Provides location data for geographic analysis of transactions.                     |
| city                       | The city of the credit card holder.                                     | Used for geographic analysis and understanding local spending trends.              |
| state                      | The state of the credit card holder.                                    | Important for regional analysis and fraud detection.                                |
| zip                        | The zip code of the credit card holder.                                 | Used for local analysis and understanding demographic distributions.                 |
| lat                        | The latitude of the credit card holder's location.                      | Used for mapping transactions and analyzing geographic trends.                      |
| long                       | The longitude of the credit card holder's location.                     | Similar to latitude, helps in geographic analysis.                                  |
| city_pop                   | The population of the city where the credit card holder resides.        | Useful for demographic analysis and understanding market size.                      |
| job                        | The job title of the credit card holder.                                | Provides insights into spending patterns based on occupation.                       |
| dob                        | The date of birth of the credit card holder.                            | Used for age analysis and understanding customer demographics.                      |
| trans_num                  | A unique identifier for the transaction.                                | Essential for tracking individual transactions and preventing duplicates.            |
| unix_time                  | The UNIX timestamp of the transaction.                                  | Useful for time-based operations and calculations.                                  |
| merch_lat                  | The latitude of the merchant's location.                                | Important for analyzing merchant locations and trends.                              |
| merch_long                 | The longitude of the merchant's location.                               | Similar to merch_lat, helps in geographic analysis.                                 |
| is_fraud                   | A flag indicating whether the transaction is fraudulent (1 for fraud, 0 for legitimate). | Essential for training fraud detection models and evaluating transaction legitimacy.  |



In [None]:
#import related library

In [None]:
# read data
data_loc = '../../data/fraudTrain.csv'

df = pd.read_csv(data_loc,index_col=0).head(5)
df.head(1).T

In [None]:
s = df.isnull().sum() > 0
print(f'No of columns with missing rows: {len(s[s])}')
print(f"Check df shape -> rows,columns: {df.shape}")

In [None]:
# check datatype
df.dtypes