# Determine the Expected Interactions 2 years after launching an Indie Game

## Project 2 Metis 

### Data Cleaning Notebook  



In [131]:
import numpy as np
import pandas as pd
import datetime as dt
from ast import literal_eval

import md_proj2 as md  #code for this project outside the notebook


Import the data and check to make sure the data is valid in the stored raw_merged_steam_df.csv

In [132]:
raw_merged_steam_df = pd.read_csv('raw_merged_steam_df.csv')

raw_merged_steam_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29233 entries, 0 to 29232
Data columns (total 59 columns):
Unnamed: 0                 29233 non-null int64
appid                      29233 non-null int64
name_x                     29228 non-null object
developer                  29036 non-null object
publisher                  28953 non-null object
score_rank                 58 non-null float64
positive                   29233 non-null int64
negative                   29233 non-null int64
userscore                  29233 non-null int64
owners                     29233 non-null object
average_forever            29233 non-null int64
average_2weeks             29233 non-null int64
median_forever             29233 non-null int64
median_2weeks              29233 non-null int64
price                      29204 non-null float64
initialprice               29211 non-null float64
discount                   29211 non-null float64
languages                  29139 non-null object
genre            

### Cleaning the data

First I will start by looking at the null values and removing unnecessary columns and those without enough useful data.


In [133]:
# drop the nulls in name_y and make the index

raw_merged_steam_df = raw_merged_steam_df.dropna(subset = ['name_y'])
raw_merged_steam_df = raw_merged_steam_df.drop_duplicates(subset='name_y')

In [134]:
raw_merged_steam_df = raw_merged_steam_df.set_index('name_y')

In [135]:
# drop all columns with over 10k nulls

working_steam_df = raw_merged_steam_df.drop(columns=['website', 
                                                     'recommendations', 
                                                     'reviews', 
                                                     'metacritic', 
                                                     'demos', 
                                                     'ext_user_account_notice', 
                                                     'drm_notice', 
                                                     'legal_notice', 
                                                     'fullgame', 
                                                     'dlc', 
                                                     'controller_support', 
                                                     'score_rank'])


In [136]:
working_steam_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 29083 entries, Counter-Strike to Rune Lord
Data columns (total 46 columns):
Unnamed: 0              29083 non-null int64
appid                   29083 non-null int64
name_x                  29080 non-null object
developer               28943 non-null object
publisher               28861 non-null object
positive                29083 non-null int64
negative                29083 non-null int64
userscore               29083 non-null int64
owners                  29083 non-null object
average_forever         29083 non-null int64
average_2weeks          29083 non-null int64
median_forever          29083 non-null int64
median_2weeks           29083 non-null int64
price                   29058 non-null float64
initialprice            29063 non-null float64
discount                29063 non-null float64
languages               29046 non-null object
genre                   28989 non-null object
ccu                     29083 non-null int64
tags        

### Columns to drop/keep, reason:

#### Keep
release_date, clean up to be just dates<br>
achievements, change to boolean<br>
movies, change to boolean<br>
categories, change to categorical<br> 
required_age, clean up to categories<br>
ccu, daily players<br>
genre, change to categorical<br>
discount,<br>
initialprice, <br>
owners, change to categorical (maybe log scale due to orders of magnitude)<br>
name_y, use as index<br>
languages, change to count of number of languages<br>
median_2weeks, median number of players in the last two weeks<br>
average_2weeks, average number of players in the last two weeks<br>
negative, number of reviews, combine with positive for total interactions<br>
positive, combine with negative for total interactions<br>
platforms, if it is release on PC, Mac, and/or Linux <br>
median_forever, forever rating (would use to determine success or failure if different algorithm)<br>
average_forever, forever rating (would use to determine success or failure if different algorithm)<br>
publisher, redundant data.  Using "Indie" tag to determine game is indie or not.  Might be needed for something else<br>
developer, redundant data.  Using "Indie" tag to determine game is indie or not.  Might be needed for something else<br>

#### Drop
supported_languages, inconsistent data<br>
content_descriptors, 26966 are none <br>
background, link to background picture<br>
support_info, not useful - links to support pages<br>
genres, redundant<br>
package_groups, data is redundant - same as price, dlc, name, etc<br>
price_overview, redundant<br>
linux_requirements, not consistent data<br>
mac_requirements, not consistent data<br>
pc_requirements, not consistent data<br>
header_image, links to images<br>
short_description, description of game<br>
about_the_game, description of game<br>
detailed_description, description of game<br>
type, all the same: 'game'<br>
tags, redundant categories data<br>
userscore, 29175 0's<br>
name_x, redundant and irrelavent<br>
appid, initial identifier for merge but not needed<br>
is_free, redundant<br>
packages, not useable data<br>
screenshots, all games have multiple screenshots.  Not useful data<br>
publisher, redundant data.  Using "Indie" tag to determine game is indie or not.<br>
developer, redundant data.  Using "Indie" tag to determine game is indie or not.<br>
platforms, not reliable since it likely cound have been updated since release <br>

In [137]:
stm_df_features = working_steam_df.drop(columns=['content_descriptors',
                                                 'name_x',
                                                  'background',
                                                  'support_info',
                                                  'genres',
                                                  'package_groups',
                                                  'price_overview',
                                                  'publishers',
                                                  'developers',
                                                  'pc_requirements',
                                                  'linux_requirements',
                                                  'mac_requirements',
                                                  'header_image',
                                                  'short_description',
                                                  'about_the_game',
                                                  'detailed_description',
                                                  'type',
                                                  'tags',
                                                  'userscore',
                                                  'appid',
                                                  'is_free',
                                                  'packages',
                                                  'screenshots',
                                                 'supported_languages',
                                                  'Unnamed: 0'])

#### Clean up the remaining columns


There is no way to get useful data from extrapolation.  Drop all the remaining nulls.

In [138]:
stm_df_features = stm_df_features.dropna(subset = ['developer','genre','categories','release_date','languages'])

One potential important piece of information is whether or not the developer is self published or if it is a major publishing company.  To remove the major publishing labels the developer and publisher columns have to be cleaned up with nulls removed and filling in missing developers with publishers if available.  

In [139]:
# if publisher is developer is missing, fill in with publisher

stm_df_features['developer'] = stm_df_features['developer'].fillna(stm_df_features['publisher'])
stm_df_features['publisher'] = stm_df_features['publisher'].fillna(stm_df_features['developer'])



In [140]:
# fill in the required age at 0 because of the high number of 0's in this feature
stm_df_features['required_age'] = stm_df_features['required_age'].fillna(0)

For the movies column, if the game has an ad movie on the Steam page, change it to 1, else change it to 0.  <br>Fill Null with 0.<br>
A quick check on the actual Steam page showed that all null blocks meant that the game did not have an intro movie where as if they did have a movie the spot was not null.

In [141]:
stm_df_features['movies'] = stm_df_features['movies'].fillna('False')
stm_df_features['movies'] = stm_df_features['movies'].apply(lambda x : 0 if (x == 'False') else 1)

In [142]:
stm_df_features.info()

<class 'pandas.core.frame.DataFrame'>
Index: 28309 entries, Counter-Strike to Rune Lord
Data columns (total 21 columns):
developer          28309 non-null object
publisher          28309 non-null object
positive           28309 non-null int64
negative           28309 non-null int64
owners             28309 non-null object
average_forever    28309 non-null int64
average_2weeks     28309 non-null int64
median_forever     28309 non-null int64
median_2weeks      28309 non-null int64
price              28309 non-null float64
initialprice       28309 non-null float64
discount           28309 non-null float64
languages          28309 non-null object
genre              28309 non-null object
ccu                28309 non-null int64
required_age       28309 non-null float64
platforms          28309 non-null object
categories         28309 non-null object
movies             28309 non-null int64
achievements       26186 non-null object
release_date       28309 non-null object
dtypes: float64(4), in

If the game has built in achievements, change it to 1, else change it to 0.  Fill Null with 0. 

I want to fill the null with 0 because if it had no built in achievements, it would be empty or null

In [143]:
stm_df_features['achievements'] = stm_df_features['achievements'].fillna(0)
stm_df_features['achievements'] = stm_df_features['achievements'].apply(lambda x: 0 if (x == "{'total': 0}") else 1)

In [144]:
stm_df_features['categories'] = stm_df_features['categories'].apply(lambda x: ';'.join(item['description'] for item in literal_eval(x)))
stm_df_features['categories'] = stm_df_features['categories'].apply(lambda x: x.split(";")[0])



In [145]:
stm_df_features.categories.value_counts()

Single-player                 26779
Multi-player                    861
Online Multi-Player             306
Local Multi-Player               88
Steam Achievements               50
MMO                              49
Steam Workshop                   29
Partial Controller Support       26
Full controller support          18
Steam Cloud                      14
Shared/Split Screen              13
Steam Trading Cards              13
Includes level editor            11
VR Support                       11
In-App Purchases                 10
Co-op                             9
Captions available                5
Local Co-op                       5
Online Co-op                      4
Cross-Platform Multiplayer        3
Steam Leaderboards                3
Includes Source SDK               1
Stats                             1
Name: categories, dtype: int64

Looking at the data for counts of categories.  Most games fell into one of 8 major categories.  Drop all the rows that are not one of the major categories

In [146]:
major_categories = ['Single-player', 'Multi-player', 'Online Multi-Player', 'Local Multi-Player', 'MMO', 'Local Co-op', 'Online Co-op', 'Co-op']

stm_df_features = stm_df_features[stm_df_features['categories'].isin(major_categories)]

In [147]:
stm_df_features['release_date'] = stm_df_features['release_date'].apply(md.eval_date)
stm_df_features['release_date'] = stm_df_features['release_date'].apply(md.parse_date)
stm_df_features['release_date'] = pd.to_datetime(stm_df_features['release_date'], format='%d %b %Y', errors='coerce')


Aug 26, 2011
Apr 23, 2019


In [148]:
stm_df_features.release_date.head()

name_y
Counter-Strike              2000-11-01
Team Fortress Classic       1999-04-01
Day of Defeat               2003-05-01
Deathmatch Classic          2001-06-01
Half-Life: Opposing Force   1999-11-01
Name: release_date, dtype: datetime64[ns]

Converting the dates into years since I'm interested in the last two years of information for games launched.

In [149]:
stm_df_features['release_date'] = stm_df_features['release_date'].dt.year
stm_df_features = stm_df_features.dropna(subset = ['release_date'])

In [150]:
stm_df_features.info()

<class 'pandas.core.frame.DataFrame'>
Index: 27762 entries, Counter-Strike to Rune Lord
Data columns (total 21 columns):
developer          27762 non-null object
publisher          27762 non-null object
positive           27762 non-null int64
negative           27762 non-null int64
owners             27762 non-null object
average_forever    27762 non-null int64
average_2weeks     27762 non-null int64
median_forever     27762 non-null int64
median_2weeks      27762 non-null int64
price              27762 non-null float64
initialprice       27762 non-null float64
discount           27762 non-null float64
languages          27762 non-null object
genre              27762 non-null object
ccu                27762 non-null int64
required_age       27762 non-null float64
platforms          27762 non-null object
categories         27762 non-null object
movies             27762 non-null int64
achievements       27762 non-null int64
release_date       27762 non-null float64
dtypes: float64(5), in

In [151]:
stm_df_features['publisher'].value_counts().head(5)

Big Fish Games    231
Strategy First    135
Ubisoft           113
THQ Nordic        100
Sekai Project      98
Name: publisher, dtype: int64

In [152]:
# Create a new column if combined data
new_dev_publisher_column = md.combine_and_binary(stm_df_features['developer'], stm_df_features['publisher'])
# Merge the new column into the df
stm_df_features = pd.merge(stm_df_features, new_dev_publisher_column, on='name_y')
# Rename the merged column
stm_df_features = stm_df_features.rename(columns={0:'Self_published'})


In [153]:
stm_df_features = stm_df_features.drop(columns=['publisher','developer'])

In [154]:
stm_df_features.head()

Unnamed: 0_level_0,positive,negative,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,languages,genre,ccu,required_age,platforms,categories,movies,achievements,release_date,Self_published
name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Counter-Strike,124534,3339,"10,000,000 .. 20,000,000",17612,709,317,26,999.0,999.0,0.0,"English, French, German, Italian, Spanish - Sp...",Action,14923,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2000.0,1
Team Fortress Classic,3318,633,"5,000,000 .. 10,000,000",277,15,62,15,499.0,499.0,0.0,"English, French, German, Italian, Spanish - Sp...",Action,87,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,1999.0,1
Day of Defeat,3416,398,"5,000,000 .. 10,000,000",187,0,34,0,499.0,499.0,0.0,"English, French, German, Italian, Spanish - Spain",Action,130,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2003.0,1
Deathmatch Classic,1273,267,"5,000,000 .. 10,000,000",258,0,184,0,499.0,499.0,0.0,"English, French, German, Italian, Spanish - Sp...",Action,4,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2001.0,1
Half-Life: Opposing Force,5250,288,"5,000,000 .. 10,000,000",624,0,415,0,499.0,499.0,0.0,"English, French, German, Korean",Action,71,0.0,"{'windows': True, 'mac': True, 'linux': True}",Single-player,0,0,1999.0,0


Use the low estimate as the number for owners.  
(Owners is number of players that own but did not necessarily buy the game. So we will leave the estimate on the low end.)

In [155]:
stm_df_features['owners'] = stm_df_features['owners'].apply(lambda x: int(x.replace(',','').split()[0]))

I wanted to try to get something useful from genre as a game's playerbase could have an effect on interactions.  

I decided to split it into major categories again.  A quick internet search confirmed that a game was likely to fall under Action, Adventure, Casual, RPG, Racing, or Strategy.  Also, some games had no genre tag and therefor fell into the None category.  

Another interesting part about the genre data is that it also contains an Indie tag.  I will be using this to isolate indie games.  

In [156]:
stm_df_features.genre.value_counts()

Action, Indie                                                                      1930
Casual, Indie                                                                      1554
Action, Adventure, Indie                                                           1286
Adventure, Indie                                                                   1202
Action, Casual, Indie                                                              1042
                                                                                   ... 
Sexual Content, Nudity, Adventure, Indie, Simulation                                  1
Action, Adventure, Casual, Free to Play, Indie, RPG, Early Access                     1
Action, Adventure, Casual, Free to Play, Massively Multiplayer, RPG, Simulation       1
Sexual Content, Nudity, Violent, RPG, Strategy                                        1
Action, Sports, Strategy, Early Access                                                1
Name: genre, Length: 1476, dtype

In [157]:
stm_df_features['genre'] = stm_df_features['genre'].apply(lambda x: x.replace(',','').split())

In [158]:
stm_df_features['genre']

name_y
Counter-Strike                                  [Action]
Team Fortress Classic                           [Action]
Day of Defeat                                   [Action]
Deathmatch Classic                              [Action]
Half-Life: Opposing Force                       [Action]
                                         ...            
Room of Pandora               [Adventure, Casual, Indie]
Cyber Gun                     [Action, Adventure, Indie]
Super Star Blast                 [Action, Casual, Indie]
New Yankee 7: Deer Hunters    [Adventure, Casual, Indie]
Rune Lord                     [Adventure, Casual, Indie]
Name: genre, Length: 27762, dtype: object

In [159]:
#Determine if game is tagged as Indie then remove the Indie tag

stm_df_features['Is_Indie'] = stm_df_features['genre'].apply(lambda x: 1 if 'Indie' in x else 0)
stm_df_features['genre'] = stm_df_features['genre'].apply(md.remove_item)
stm_df_features['Game_Genre'] = stm_df_features['genre'].apply(md.assign_genre)

In [160]:
stm_df_features['Game_Genre'].value_counts()

Action       12308
Adventure     5630
Casual        4665
None          2029
Strategy      1703
RPG           1079
Racing         348
Name: Game_Genre, dtype: int64

Since almost 20k games are tagged as Indie, I'm dropping the games that are not as large developers will skew the model

In [161]:
stm_df_features['genre'].value_counts()

[Action]                                                                                         2811
[Casual]                                                                                         2180
[Adventure]                                                                                      1768
[Action, Adventure]                                                                              1673
[Adventure, Casual]                                                                              1356
                                                                                                 ... 
[Gore, Casual, Simulation, Strategy]                                                                1
[Action, Adventure, Casual, Massively, Multiplayer, RPG, Simulation, Strategy, Early, Access]       1
[RPG, Simulation, Sports]                                                                           1
[Action, Adventure, Casual, Free, to, Play, Massively, Multiplayer, Simulation, Sp

In [162]:
# Categorized genre into Is_Indie and Game_Genre, removing this as it contains unused duplicate info

stm_df_features = stm_df_features.drop(columns=['genre'])

In [163]:
stm_df_features.head(2)

Unnamed: 0_level_0,positive,negative,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,...,ccu,required_age,platforms,categories,movies,achievements,release_date,Self_published,Is_Indie,Game_Genre
name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Counter-Strike,124534,3339,10000000,17612,709,317,26,999.0,999.0,0.0,...,14923,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2000.0,1,0,Action
Team Fortress Classic,3318,633,5000000,277,15,62,15,499.0,499.0,0.0,...,87,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,1999.0,1,0,Action


The languages & platforms are similar to the way the data was formatted in the genre column.  So, I split it into individual languages and game platforms and then did a count of the number of languages and a count of the number of game consoles each one row had.  

In [164]:
stm_df_features['languages'] = stm_df_features['languages'].astype(str)

In [165]:
stm_df_features['languages'] = stm_df_features['languages'].apply(lambda x: x.split())

In [166]:
stm_df_features['Number_Languages'] = stm_df_features['languages'].apply(lambda x: len(x))

In [167]:
stm_df_features.head(1)

Unnamed: 0_level_0,positive,negative,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,...,required_age,platforms,categories,movies,achievements,release_date,Self_published,Is_Indie,Game_Genre,Number_Languages
name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Counter-Strike,124534,3339,10000000,17612,709,317,26,999.0,999.0,0.0,...,0.0,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2000.0,1,0,Action,12


In [168]:
stm_df_features = stm_df_features.drop(columns=['languages'])

In [169]:
# Uses the function convert_platforms to count which platforms it has listed.  

stm_df_features['Number_Platforms'] = stm_df_features['platforms'].apply(md.convert_platforms)


In [170]:
stm_df_features.head()

Unnamed: 0_level_0,positive,negative,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,...,platforms,categories,movies,achievements,release_date,Self_published,Is_Indie,Game_Genre,Number_Languages,Number_Platforms
name_y,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Counter-Strike,124534,3339,10000000,17612,709,317,26,999.0,999.0,0.0,...,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2000.0,1,0,Action,12,3
Team Fortress Classic,3318,633,5000000,277,15,62,15,499.0,499.0,0.0,...,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,1999.0,1,0,Action,13,3
Day of Defeat,3416,398,5000000,187,0,34,0,499.0,499.0,0.0,...,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2003.0,1,0,Action,7,3
Deathmatch Classic,1273,267,5000000,258,0,184,0,499.0,499.0,0.0,...,"{'windows': True, 'mac': True, 'linux': True}",Multi-player,0,0,2001.0,1,0,Action,13,3
Half-Life: Opposing Force,5250,288,5000000,624,0,415,0,499.0,499.0,0.0,...,"{'windows': True, 'mac': True, 'linux': True}",Single-player,0,0,1999.0,0,0,Action,4,3


Renamed all the columns to make more sense.  

In [171]:
stm_df_features = stm_df_features.rename(columns = {'name_y':'Game_Name', 
                                                    'positive':'Positive_Reviews',
                                                    'negative':'Negative_Reviews',
                                                    'average_forever':'Average_Daily_Players',
                                                    'median_forever':'Median_Daily_Players',
                                                   'owners':'Approx_Owners',
                                                   'price':'Current_Price',
                                                   'initialprice':'Initial_Price',
                                                   'discount':'Discount_Percent',
                                                   'required_age':'Required_Age',
                                                   'categories':'Single_or_Multiplayer',
                                                   'movies':'Has_Game_Trailer',
                                                   'ccu':'Daily_Players',
                                                   'achievements':'Has_Achievements',
                                                    'platforms':'Platforms',
                                                   'release_date':'release_year'})


In [172]:
# Pricing over $100 for a game is not a realistic number.  Removing all those prices. 

stm_df_features = stm_df_features[stm_df_features['Initial_Price'] < 10000]



In [173]:
stm_df_features.info()

<class 'pandas.core.frame.DataFrame'>
Index: 27755 entries, Counter-Strike to Rune Lord
Data columns (total 22 columns):
Positive_Reviews         27755 non-null int64
Negative_Reviews         27755 non-null int64
Approx_Owners            27755 non-null int64
Average_Daily_Players    27755 non-null int64
average_2weeks           27755 non-null int64
Median_Daily_Players     27755 non-null int64
median_2weeks            27755 non-null int64
Current_Price            27755 non-null float64
Initial_Price            27755 non-null float64
Discount_Percent         27755 non-null float64
Daily_Players            27755 non-null int64
Required_Age             27755 non-null float64
Platforms                27755 non-null object
Single_or_Multiplayer    27755 non-null object
Has_Game_Trailer         27755 non-null int64
Has_Achievements         27755 non-null int64
release_year             27755 non-null float64
Self_published           27755 non-null int64
Is_Indie                 27755 non-null

In [174]:
# Exporting the cleaned file to csv.

stm_df_features.to_csv('steam_cleaned_data.csv')

# -------------------------Cleaned File Exported to CSV----------------------------

Cleaned filed exported.  This is the end of this notebook.  Process continues in Project_2_Mark_Dziuban_EDA.