# Bitstamp Assesment Test
## Task 1: Where are my assets?

Welcome to crypto! The Product team is seeking insights into the performance of our lending product. Your mission is to:

1. Clean (if needed) and enrich the attached data (*task_1_earn.csv*) – end goal is having clean table which can be
easily used (without any manipulation) for any kind of analytics. The presented data includes all completed lending
withdrawals (when a user makes a request to stop lending). Add yearweek and yearmonth columns and any other
that will be useful to end users.
Definitions:
    - **User_Id** is the id of user
    - **Id** is the identifier of the withdrawal request
    - **Requested_at** means when the user made a request to unlend
    - **Finished_at** means when the lending provider completed the lending and user got the funds

2. Prepare an analysis of the lending product with the data you have so the product team will be able to identify if we
have any issues with our lending provider. Include numbers and graphs and don't forget to write key findings.
    -  Identify key trends, patterns, and potential issues with the lending provider.

3. Based on your analysis:
    - **Identify opportunities** to improve the **performance** and **reliability** of our lending provider.
    - Suggest actionable **recommendations** for the Product team to address these issues.

##### Import libraries
Import main libraries for analysing data and visualisation

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

sns.set_theme(style="whitegrid")


##### Load Data

Import data from the *task_1_earn.csv* file. The tester shoul change the **file_path_1** variable to the path to the file, if it is not already in the same directory.

In [12]:
# Load the CSV file into a DataFrame /w check for file existence
file_path_1 = 'task_1_earn.csv'  # Path to the CSV file (change if the testers file is in a different directory)
try:
    df_1_1 = pd.read_csv(file_path_1)
    print("Data loaded successfully!")
except FileNotFoundError:
    print(f"File not found at {file_path_1}. Please check the path.")

# Display the first few rows of the DataFrame
df_1.head()


Data loaded successfully!


Unnamed: 0,currency,user_id,id,amount_native,amount_usd,requested_at,finished_at
0,BTC,44017161,117200,17.44995,1234.82563,2020-11-24 22:59:35,2020-01-25 14:00:02
1,MATIC,46740482,117197,17.450786,546.674743,2020-01-24 22:33:10,2020-01-25 02:00:20
2,PEPE,46489105,117194,17.446612,556.810541,2020-01-24 22:12:18,2020-01-25 02:00:19
3,PEPE,46117080,117193,17.446693,1045.785866,2020-01-24 22:03:20,2020-01-25 02:00:19
4,AVAX,47626266,117191,17.45417,653.661058,2020-01-24 22:01:30,2020-01-25 02:00:41


In [16]:
# Ensure date columns are in datetime format
if 'requested_at' in df_1.columns and 'finished_at' in df_1.columns:
    df_1['requested_at'] = pd.to_datetime(df_1['requested_at'], errors='coerce')
    df_1['finished_at'] = pd.to_datetime(df_1['finished_at'], errors='coerce')

# Check for and print duplicate rows
duplicate_count = df_1.duplicated().sum()
print(f"Number of duplicate rows: {duplicate_count}")

# Remove duplicates
df_1 = df_1.drop_duplicates()

# Check for and print rows with all blank or NaN values
blank_row_count = df_1.isnull().all(axis=1).sum()
print(f"Number of completely blank (NaN) rows: {blank_row_count}")

# Remove completely blank rows
df_1 = df_1.dropna(how='all')

# Check for missing values column-wise
missing_values = df_1.isnull().sum()
print("\nMissing values in each column:")
print(missing_values)



Number of duplicate rows: 0
Number of completely blank (NaN) rows: 0

Missing values in each column:
currency           0
user_id            0
id                 0
amount_native      0
amount_usd         0
requested_at     294
finished_at      429
dtype: int64


In [17]:
# Add yearweek and yearmonth columns
df_1['yearweek'] = df_1['requested_at'].dt.strftime('%Y-%U')
df_1['yearmonth'] = df_1['requested_at'].dt.strftime('%Y-%m')

# Add time_to_complete column (in hours)
df_1['time_to_complete'] = (df_1['finished_at'] - df_1['requested_at']).dt.total_seconds() / 3600

# Check the enriched data
df_1.head()


Unnamed: 0,currency,user_id,id,amount_native,amount_usd,requested_at,finished_at,yearweek,yearmonth,time_to_complete
0,BTC,44017161,117200,17.44995,1234.82563,2020-11-24 22:59:35,2020-01-25 14:00:02,2020-47,2020-11,-7304.9925
1,MATIC,46740482,117197,17.450786,546.674743,2020-01-24 22:33:10,2020-01-25 02:00:20,2020-03,2020-01,3.452778
2,PEPE,46489105,117194,17.446612,556.810541,2020-01-24 22:12:18,2020-01-25 02:00:19,2020-03,2020-01,3.800278
3,PEPE,46117080,117193,17.446693,1045.785866,2020-01-24 22:03:20,2020-01-25 02:00:19,2020-03,2020-01,3.949722
4,AVAX,47626266,117191,17.45417,653.661058,2020-01-24 22:01:30,2020-01-25 02:00:41,2020-03,2020-01,3.986389
