# Playground
This is a playground for testing code snippets and examples.
resource:
- [w3schools Python](https://www.w3schools.com/python/default.asp)
- [w3schools Pandas](https://www.w3schools.com/python/pandas/default.asp)

## Pandas Basics

In [11]:
from operator import index

import pandas as pd
df = pd.read_csv('data/pandas_w3.csv')
print(df)

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
..        ...    ...       ...       ...
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4

[169 rows x 4 columns]


**Read JSON data from a URL**

In [12]:
import requests

response = requests.get('https://dummyjson.com/users')
if response.status_code == 200:
    data = response.json()
    df_users = pd.DataFrame(data['users'])
    print(df_users.head(10))

else:
    print(f"Error: {response.status_code}, {response.reason}")

   id  firstName  lastName maidenName  age  gender  \
0   1      Emily   Johnson      Smith   28  female   
1   2    Michael  Williams              35    male   
2   3     Sophia     Brown              42  female   
3   4      James     Davis              45    male   
4   5       Emma    Miller    Johnson   30  female   
5   6     Olivia    Wilson              22  female   
6   7  Alexander     Jones              38    male   
7   8        Ava    Taylor              27  female   
8   9      Ethan  Martinez              33    male   
9  10   Isabella  Anderson      Davis   31  female   

                               email             phone    username  \
0      emily.johnson@x.dummyjson.com  +81 965-431-3024      emilys   
1   michael.williams@x.dummyjson.com  +49 258-627-6644    michaelw   
2       sophia.brown@x.dummyjson.com  +81 210-652-2785     sophiab   
3        james.davis@x.dummyjson.com  +49 614-958-9364      jamesd   
4        emma.miller@x.dummyjson.com  +91 759-776-1614 

Data Frame Info

In [13]:
df = pd.read_csv('data/pandas_w3.csv')
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  169 non-null    int64  
 1   Pulse     169 non-null    int64  
 2   Maxpulse  169 non-null    int64  
 3   Calories  164 non-null    float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
None


## Cleaning Data

### Empty Cells

**Remove Rows**

In [14]:
df = pd.read_csv('data/pandas_w3.csv')

df.dropna(inplace=True)  # Remove rows with any empty cells
print(f"{df.shape[0]} rows after dropping empty cells.")
print(df.info())
print(df.to_string())

164 rows after dropping empty cells.
<class 'pandas.core.frame.DataFrame'>
Index: 164 entries, 0 to 168
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  164 non-null    int64  
 1   Pulse     164 non-null    int64  
 2   Maxpulse  164 non-null    int64  
 3   Calories  164 non-null    float64
dtypes: float64(1), int64(3)
memory usage: 6.4 KB
None
     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
5          60    102       127     300.0
6          60    110       136     374.0
7          45    104       134     253.3
8          30    109       133     195.1
9          60     98       124     269.0
10         60    103       147     329.3
11         60    100       120     250.7
12         60    106       128     345.3
1

**Replace Empty Value**

In [15]:
df = pd.read_csv('data/pandas_w3.csv')
# Replace Empty Values
# df.fillna(130, inplace=True)

#Replace Only For Specified Columns
df.fillna({"Calories": 130}, inplace=True)

print(df.to_string())

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
5          60    102       127     300.0
6          60    110       136     374.0
7          45    104       134     253.3
8          30    109       133     195.1
9          60     98       124     269.0
10         60    103       147     329.3
11         60    100       120     250.7
12         60    106       128     345.3
13         60    104       132     379.3
14         60     98       123     275.0
15         60     98       120     215.2
16         60    100       120     300.0
17         45     90       112     130.0
18         60    103       123     323.0
19         45     97       125     243.0
20         60    108       131     364.2
21         45    100       119     282.0
22         60    130       101     300.0
23         45   

**Replace Using Mean, Median, or Mode**

In [16]:
df = pd.read_csv('data/pandas_w3.csv')
# Replace using Mean
x = df["Calories"].median() # or df["Calories"].mean() or df["Calories"].mode()[0]
df.fillna({"Calories": x}, inplace=True)
print(df.to_string())

     Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
5          60    102       127     300.0
6          60    110       136     374.0
7          45    104       134     253.3
8          30    109       133     195.1
9          60     98       124     269.0
10         60    103       147     329.3
11         60    100       120     250.7
12         60    106       128     345.3
13         60    104       132     379.3
14         60     98       123     275.0
15         60     98       120     215.2
16         60    100       120     300.0
17         45     90       112     318.6
18         60    103       123     323.0
19         45     97       125     243.0
20         60    108       131     364.2
21         45    100       119     282.0
22         60    130       101     300.0
23         45   

### Cleaning Wrong Format

**Convert Into a Correct Format**

In [24]:
df = pd.read_csv('data/pandas_w3.csv')

df['Date'] = pd.to_datetime(df['Date'], format='mixed') # Convert 'Date' column to datetime format
df.dropna(subset=['Date'], inplace=True) # Remove rows with invalid dates

print(df.info())
print(df.to_string())

<class 'pandas.core.frame.DataFrame'>
Index: 31 entries, 0 to 31
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   Duration  31 non-null     int64         
 1   Date      31 non-null     datetime64[ns]
 2   Pulse     31 non-null     int64         
 3   Maxpulse  31 non-null     int64         
 4   Calories  29 non-null     float64       
dtypes: datetime64[ns](1), float64(1), int64(3)
memory usage: 1.5 KB
None
    Duration       Date  Pulse  Maxpulse  Calories
0         60 2020-12-01    110       130     409.1
1         60 2020-12-02    117       145     479.0
2         60 2020-12-03    103       135     340.0
3         45 2020-12-04    109       175     282.4
4         45 2020-12-05    117       148     406.0
5         60 2020-12-06    102       127     300.0
6         60 2020-12-07    110       136     374.0
7        450 2020-12-08    104       134     253.3
8         30 2020-12-09    109       133     195

### Cleaning Wrong Data

In [28]:
df = pd.read_csv('data/pandas_w3.csv')

df.loc[7, 'Duration'] = 45 # Correct the wrong value in 'Duration' column

print(df)

    Duration          Date  Pulse  Maxpulse  Calories
0         60  '2020/12/01'    110       130     409.1
1         60  '2020/12/02'    117       145     479.0
2         60  '2020/12/03'    103       135     340.0
3         45  '2020/12/04'    109       175     282.4
4         45  '2020/12/05'    117       148     406.0
5         60  '2020/12/06'    102       127     300.0
6         60  '2020/12/07'    110       136     374.0
7         45  '2020/12/08'    104       134     253.3
8         30  '2020/12/09'    109       133     195.1
9         60  '2020/12/10'     98       124     269.0
10        60  '2020/12/11'    103       147     329.3
11        60  '2020/12/12'    100       120     250.7
12        60  '2020/12/12'    100       120     250.7
13        60  '2020/12/13'    106       128     345.3
14        60  '2020/12/14'    104       132     379.3
15        60  '2020/12/15'     98       123     275.0
16        60  '2020/12/16'     98       120     215.2
17        60  '2020/12/17'  

### Removing Duplicates

In [34]:
df = pd.read_csv('data/pandas_w3.csv')
df.drop_duplicates(inplace=True)  # Remove duplicate rows
print(df.duplicated().sum())  # Count duplicates

0


## Correlations

In [39]:
df = pd.read_csv('data/data_correlations.csv')

print(df.to_string)

<bound method DataFrame.to_string of      Duration  Pulse  Maxpulse  Calories
0          60    110       130     409.1
1          60    117       145     479.0
2          60    103       135     340.0
3          45    109       175     282.4
4          45    117       148     406.0
..        ...    ...       ...       ...
164        60    105       140     290.8
165        60    110       145     300.0
166        60    115       145     310.2
167        75    120       150     320.4
168        75    125       150     330.4

[169 rows x 4 columns]>
