# 2022 Pomodoro

## Purpose 
I've got a csv file with multiple years of information on how I spend time working on tasks and projects. This is a notebook that is going to clean up the file I have for all my pomodoro tasks. Previously I've done this cleaning in google sheets. I'd like to clean this up with pandas since I'm leaning Python.

## Data
This is my personal data, I use an app called BeFocused Pro which is a timer set for 25 minutes, a user can assign this timer to specific tasks. It's what I use when I work on projects, school assignments, work tasks, just about for everything. It's a method that I use to stay focused on a task. The data consists of time stamps and includes information from the 2017 to present day.

## Data Cleaning

1. Import pandas and use the *read.csv* to read the csv with all of my data
2. Review the columns present in my data with `data.columns` & then use `data.describe()` to review the characteristics of the numeric fields
3. Remove duplicates that may be present here with `data.drop_duplicates(inplace = True)` will drop and save it to the current data, otherwise it would create a new data
4. Remove any nulls that may be present with `data.isnull().any()` -- the `.any()` will check to see for each column if there are any null values at all

In [11]:
pip install pandas

Collecting pandas
  Downloading pandas-1.4.0-cp310-cp310-macosx_11_0_arm64.whl (10.5 MB)
     |████████████████████████████████| 10.5 MB 1.5 MB/s            
[?25hCollecting numpy>=1.21.0
  Downloading numpy-1.22.1-cp310-cp310-macosx_11_0_arm64.whl (12.8 MB)
     |████████████████████████████████| 12.8 MB 28.5 MB/s            
Collecting pytz>=2020.1
  Downloading pytz-2021.3-py2.py3-none-any.whl (503 kB)
     |████████████████████████████████| 503 kB 26.3 MB/s            
Installing collected packages: pytz, numpy, pandas
Successfully installed numpy-1.22.1 pandas-1.4.0 pytz-2021.3
Note: you may need to restart the kernel to use updated packages.


In [35]:
pip install datetime

Collecting datetime
  Downloading DateTime-4.3-py2.py3-none-any.whl (60 kB)
     |████████████████████████████████| 60 kB 1.2 MB/s            
[?25hCollecting zope.interface
  Downloading zope.interface-5.4.0.tar.gz (249 kB)
     |████████████████████████████████| 249 kB 2.4 MB/s            
[?25h  Preparing metadata (setup.py) ... [?25ldone
Using legacy 'setup.py install' for zope.interface, since package 'wheel' is not installed.
Installing collected packages: zope.interface, datetime
    Running setup.py install for zope.interface ... [?25ldone
[?25hSuccessfully installed datetime-4.3 zope.interface-5.4.0
Note: you may need to restart the kernel to use updated packages.


### Import Data - Step 1

In [1]:
import pandas as pd

data = pd.read_csv("/Users/ingridarreola/Documents/GitHub/VisualizationProjects/Pomodoro 2022/Feb_4.csv")

### Data Cleaning - Step 2

In [2]:
#Rename columns

data.columns = ['Start_date', 'Duration', 'Assigned_Task', 'Task_State']
data.head()

Unnamed: 0,Start_date,Duration,Assigned_Task,Task_State
0,,,,
1,Sep 5 2017 at 8:49:26 PM,25.0,Parachute book,Done
2,Sep 6 2017 at 2:25:52 PM,25.0,STAR,Done
3,Sep 6 2017 at 3:05:30 PM,25.0,STAR,Done
4,Sep 6 2017 at 3:38:43 PM,25.0,STAR,Done


### Creating Time Stamps
Using *pd.to_datetime* to create a time stamp and break up all of this information into more columns to break up the time & to make filtering even easier for data from the current year.

In [3]:
# Create a time stamp as an additional column in the csv

data['Time_Stamp'] = pd.to_datetime(data['Start_date'])

stamp = pd.to_datetime(data['Start_date'])

data.head()

Unnamed: 0,Start_date,Duration,Assigned_Task,Task_State,Time_Stamp
0,,,,,NaT
1,Sep 5 2017 at 8:49:26 PM,25.0,Parachute book,Done,2017-09-05 20:49:26
2,Sep 6 2017 at 2:25:52 PM,25.0,STAR,Done,2017-09-06 14:25:52
3,Sep 6 2017 at 3:05:30 PM,25.0,STAR,Done,2017-09-06 15:05:30
4,Sep 6 2017 at 3:38:43 PM,25.0,STAR,Done,2017-09-06 15:38:43


In [4]:
data['Short_Date'] = stamp.dt.strftime('%m/%d/%Y')

data['Year'] = stamp.dt.strftime('%Y')

data['Day'] = stamp.dt.strftime('%d')

data.head()

Unnamed: 0,Start_date,Duration,Assigned_Task,Task_State,Time_Stamp,Short_Date,Year,Day
0,,,,,NaT,,,
1,Sep 5 2017 at 8:49:26 PM,25.0,Parachute book,Done,2017-09-05 20:49:26,09/05/2017,2017.0,5.0
2,Sep 6 2017 at 2:25:52 PM,25.0,STAR,Done,2017-09-06 14:25:52,09/06/2017,2017.0,6.0
3,Sep 6 2017 at 3:05:30 PM,25.0,STAR,Done,2017-09-06 15:05:30,09/06/2017,2017.0,6.0
4,Sep 6 2017 at 3:38:43 PM,25.0,STAR,Done,2017-09-06 15:38:43,09/06/2017,2017.0,6.0


### 2022 Data

In [14]:
# Interested in 2022 for present year analysis

current_year = data['Short_Date'] >= '1/1/2022'

In [12]:
data.to_csv('/Users/ingridarreola/Documents/GitHub/VisualizationProjects/Pomodoro 2022/Feb_4_2022.csv')