# 📊 Dataset Information

**Note:** The dataset for this project is **not included in this repository** to keep the repository lightweight.  

You can download the dataset from **Kaggle** using the link below:  

[Download the Steam Dataset](https://www.kaggle.com/datasets/fmpugliese/steam-all-games-data)  

Please make sure to place the downloaded CSV file in the same directory as this notebook before running any code.

# Importing

## Import Librery

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Import CSV And convert to DataFrame

In [2]:
df = pd.read_csv('all_data.csv')

# PreProcessing

## Details of Dataset

### Frist five row

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu
0,0,10,Counter-Strike,Valve,Valve,,243818,6427,0,"10,000,000 .. 20,000,000",12222,563,204,88,199.0,999.0,80.0,7323
1,1,20,Team Fortress Classic,Valve,Valve,,7602,1136,0,"1,000,000 .. 2,000,000",361,6722,15,6722,499.0,499.0,0.0,66
2,2,30,Day of Defeat,Valve,Valve,,6414,688,0,"5,000,000 .. 10,000,000",859,3485,23,3604,499.0,499.0,0.0,87
3,3,40,Deathmatch Classic,Valve,Valve,,2618,545,0,"5,000,000 .. 10,000,000",353,4,10,4,499.0,499.0,0.0,7
4,4,50,Half-Life: Opposing Force,Gearbox Software,Valve,,24363,1198,0,"2,000,000 .. 5,000,000",528,78,162,78,499.0,499.0,0.0,74


### last Five row

In [4]:
df.tail()

Unnamed: 0.1,Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu
86533,86533,3739280,Nocturne FX,Moutoux Software,Moutoux Software,,1,0,0,"0 .. 20,000",0,0,0,0,299.0,299.0,0.0,0
86534,86534,3740180,To Late_The last room,Varga Gábor,Varga Gábor,,0,1,0,"0 .. 20,000",0,0,0,0,299.0,299.0,0.0,0
86535,86535,3740890,Stray Tekirs,VIBRA,VIBRA,,3,0,0,"0 .. 20,000",0,0,0,0,199.0,199.0,0.0,0
86536,86536,3743620,Chill Train - Densha no kuni,Judox Studio,Judox Studio,,1,0,0,"0 .. 20,000",0,0,0,0,599.0,599.0,0.0,0
86537,86537,3744990,Archipelago Luminary,Gove,Gove,,4,0,0,"0 .. 20,000",0,0,0,0,199.0,199.0,0.0,0


### Shape of our dataset

In [5]:
df.shape

(86538, 18)

### List out all columns

In [6]:
df.columns

Index(['Unnamed: 0', 'appid', 'name', 'developer', 'publisher', 'score_rank',
       'positive', 'negative', 'userscore', 'owners', 'average_forever',
       'average_2weeks', 'median_forever', 'median_2weeks', 'price',
       'initialprice', 'discount', 'ccu'],
      dtype='object')

### Datatype of each columns

In [7]:
df.dtypes

Unnamed: 0           int64
appid                int64
name                object
developer           object
publisher           object
score_rank         float64
positive             int64
negative             int64
userscore            int64
owners              object
average_forever      int64
average_2weeks       int64
median_forever       int64
median_2weeks        int64
price              float64
initialprice       float64
discount           float64
ccu                  int64
dtype: object

### Information of all Columns

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86538 entries, 0 to 86537
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       86538 non-null  int64  
 1   appid            86538 non-null  int64  
 2   name             86520 non-null  object 
 3   developer        86227 non-null  object 
 4   publisher        85924 non-null  object 
 5   score_rank       47 non-null     float64
 6   positive         86538 non-null  int64  
 7   negative         86538 non-null  int64  
 8   userscore        86538 non-null  int64  
 9   owners           86538 non-null  object 
 10  average_forever  86538 non-null  int64  
 11  average_2weeks   86538 non-null  int64  
 12  median_forever   86538 non-null  int64  
 13  median_2weeks    86538 non-null  int64  
 14  price            86510 non-null  float64
 15  initialprice     86517 non-null  float64
 16  discount         86517 non-null  float64
 17  ccu         

### Check Null Value

In [9]:
df.isnull().sum()

Unnamed: 0             0
appid                  0
name                  18
developer            311
publisher            614
score_rank         86491
positive               0
negative               0
userscore              0
owners                 0
average_forever        0
average_2weeks         0
median_forever         0
median_2weeks          0
price                 28
initialprice          21
discount              21
ccu                    0
dtype: int64

### Handle Null value, Replace with Mean

In [10]:
df['price'] = df['price'].fillna(df['price'].mean())
df['initialprice'] = df['initialprice'].fillna(df['initialprice'].mean())
df['discount'] = df['discount'].fillna(df['discount'].mean())

df.isnull().sum()

Unnamed: 0             0
appid                  0
name                  18
developer            311
publisher            614
score_rank         86491
positive               0
negative               0
userscore              0
owners                 0
average_forever        0
average_2weeks         0
median_forever         0
median_2weeks          0
price                  0
initialprice           0
discount               0
ccu                    0
dtype: int64

### Check Duplicate Value

In [11]:
df.duplicated().sum()

np.int64(0)

### summary of the dataset

In [12]:
df.describe()

Unnamed: 0.1,Unnamed: 0,appid,score_rank,positive,negative,userscore,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu
count,86538.0,86538.0,47.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0,86538.0
mean,43268.5,1527852.0,99.234043,1470.32,242.0893,0.041542,235.661744,15.183838,195.83524,16.144769,753.596498,825.954992,5.349515,66.53134
std,24981.513135,905035.7,0.666358,32647.57,6194.164,1.827883,3034.440614,242.108848,3176.814575,260.329724,1275.62714,1340.945271,18.024946,3841.239
min,0.0,10.0,98.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,21634.25,776845.0,99.0,5.0,1.0,0.0,0.0,0.0,0.0,0.0,149.0,199.0,0.0,0.0
50%,43268.5,1380895.0,99.0,18.0,5.0,0.0,0.0,0.0,0.0,0.0,499.0,499.0,0.0,0.0
75%,64902.75,2207388.0,100.0,97.0,27.0,0.0,77.0,0.0,64.0,0.0,999.0,999.0,0.0,0.0
max,86537.0,3744990.0,100.0,7642084.0,1173003.0,100.0,608013.0,19953.0,608015.0,19953.0,99998.0,99998.0,100.0,1013936.0
