### overview

I completed a 12-day course in 4 days.
Pandas was difficult.
Python basics were easy.
Miniconda setup took more time than expected.


# Importing Libraries and Reading CSV File

First, we need to import the necessary Python libraries. Here, we are using **pandas** to read and manipulate CSV files.

In [15]:
import pandas as pd       
import numpy as np        
import plotly.express as px 
import seaborn as sns



In [16]:
pd.read_csv("learning_log.csv")
df = pd.read_csv("learning_log.csv")



In [17]:
df.head()


Unnamed: 0,date,topic,duration_minutes,difficulty,notes
0,2025-12-08,Course Introduction & Resources,120,2,Videos and blogs overview
1,2025-12-08,Python Basics,120,2,Operators and syntax
2,2025-12-09,Python Basics,240,2,If else and data types
3,2025-12-10,Python Practice,240,2,Data structures and practice
4,2025-12-11,Pandas,240,4,Data handling was difficult


In [18]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   date              6 non-null      object
 1   topic             6 non-null      object
 2   duration_minutes  6 non-null      int64 
 3   difficulty        6 non-null      int64 
 4   notes             6 non-null      object
dtypes: int64(2), object(3)
memory usage: 372.0+ bytes


In [19]:
df.describe()

Unnamed: 0,duration_minutes,difficulty
count,6.0,6.0
mean,180.0,2.666667
std,65.726707,1.032796
min,120.0,2.0
25%,120.0,2.0
50%,180.0,2.0
75%,240.0,3.5
max,240.0,4.0


## Cleaning Data: Missing Values and Duplicates

When working with data, itâ€™s common to encounter **missing values** or **duplicate rows**. These can cause errors or having bad results if not handled properly.

- **Missing Values**:  
  Some rows may have empty or null values. Removing or filling these ensures that analyses and calculations are accurate.

- **Duplicate Rows**:  
  Sometimes the same data entry appears more than once. Removing duplicates prevents double-counting and keeps the dataset clean and reliable.


In [20]:
import pandas as pd

# Original CSV data
df = pd.read_csv("learning_log.csv")


In [21]:

# Remove missing values and save in a new variable
df_no_missing = df.dropna()


In [22]:
# Remove duplicates from df_no_missing and save in another variable
df_clean = df_no_missing.drop_duplicates()


In [23]:
df_clean

Unnamed: 0,date,topic,duration_minutes,difficulty,notes
0,2025-12-08,Course Introduction & Resources,120,2,Videos and blogs overview
1,2025-12-08,Python Basics,120,2,Operators and syntax
2,2025-12-09,Python Basics,240,2,If else and data types
3,2025-12-10,Python Practice,240,2,Data structures and practice
4,2025-12-11,Pandas,240,4,Data handling was difficult
5,2025-12-11,Environment Setup (Miniconda),120,4,Installation and setup issues


## Plot 1: Total Study Duration Per Day

This plot shows how many minutes were spent studying each day. It helps track **daily progress** and see which days you studied more or less.


In [24]:
import plotly.express as px

# Group total duration per day
daily_duration = df_clean.groupby('date')['duration_minutes'].sum().reset_index()

# Interactive line chart
plot1= px.line(daily_duration, x='date', y='duration_minutes',
               title='Total Study Duration Per Day', markers=True)
plot1.update_layout(xaxis_title='Date', yaxis_title='Minutes Studied')
plot1.show()


## Plot 2: Difficulty Level per Topic

This bar chart shows the **difficulty level of each topic**. It helps visualize which topics were easy and which were challenging.


In [25]:
# Interactive bar chart for difficulty
plot2 = px.bar(df_clean, x='topic', y='difficulty', color='difficulty',
              title='Difficulty Level per Topic', text='difficulty')
plot2.update_layout(xaxis_title='Topic', yaxis_title='Difficulty')
plot2.show()


## Plot 3: Duration vs Difficulty

This scatter plot compares **how much time was spent on each topic** against its difficulty.  
Bigger points can represent longer durations, and color shows difficulty.


In [26]:
# Scatter plot for duration vs difficulty
plot3 = px.scatter(df_clean, x='duration_minutes', y='difficulty',
                  size='duration_minutes', color='difficulty',
                  hover_data=['topic'],  # Show topic only on hover
                  title='Duration vs Difficulty')
plot3.update_layout(xaxis_title='Duration (minutes)', yaxis_title='Difficulty')
plot3.show()
