In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
whenamancodes_play_store_apps_path = kagglehub.dataset_download('whenamancodes/play-store-apps')

print('Data source import complete.')


In this notebook, we will learn:
* How to import Pandas
* How to create Pandas Series and DataFrames using various methods
* How to access and change elements in Series and DataFrames
* How to deal with missing values
* How to load data into a DataFrame and extract, filter and transform data
* How to calculate statistics and create visualizations
* Different methods and attributes which helps in data manipulation and analysis

<br>
<img src = "https://files.realpython.com/media/Intro-to-Exploratory-Data-Analysis-With-Pandas_Watermarked.81a7d7df468f.jpg" width = "600"/>
<br>
<center><str><a target="_blank" href="https://files.realpython.com/media/Intro-to-Exploratory-Data-Analysis-With-Pandas_Watermarked.81a7d7df468f.jpg">Image Source</a></str></center>


<a id = "1."></a>
# 1. Introduction

<div class="alert alert-block alert-success">
Pandas is a package for the manipulation and analysis of data in Python. It is built on top of numpy. The name pandas is derived from the econometrics term Panel Data. It is fast, powerful, flexible, open source and easy to use. Pandas incorporates two additional data structures into Python, namely Pandas Series and Pandas DataFrame. We can load data of different formats into DataFrames.</div>


<div class="alert alert-block alert-info">
Importing necessary libraries:</div>

In [None]:
import pandas as pd
import numpy as np

<a id = "2."></a>
# 2. Series

<div class="alert alert-block alert-info">
A pandas series is a one-dimentional data structure which can store data such as strings, integers, floats and other python objects. <b>pandas.Series()</b> method is used to create pandas series. In sereis, indices are are stored in first column and data in the second column.
</div>


Let's see some examples:

<div class="alert alert-block alert-warning">
    If we don't specify the 'index' argument in <b>pd.Series</b> then the indices will be integers starting from 0.
</div>

In [None]:
my_list = ["Yes", "No", 12, 10]

my_series = pd.Series(data = my_list)

print(type(my_series))
my_series

<div class="alert alert-block alert-warning">
    We can add indices of our own choice by specifying the 'index' argument in <b>pd.Series</b>.
</div>

In [None]:
my_list = ["Yes", "No", 100, 10]
indices = ['milk', 'bread', 'apples', 'eggs']

groceries = pd.Series(data = my_list, index = indices)
groceries

<div class="alert alert-block alert-warning">
 We can also get the shape, number of dimensions and size of pandas Sereis by using <b>Series.attribute_name</b>.
</div>

In [None]:
print("groceries.shape: ", groceries.shape)
print("grocereis.ndim: ", groceries.ndim) # number of dimensions
print("groceries.size: ", groceries.size)

<div class="alert alert-block alert-warning">
We can also get the values and indexes of a series:
</div>

In [None]:
print("groceries.values: ", groceries.values)
print()
print("groceries.index: ", groceries.index)

<div class="alert alert-block alert-warning">
We can also check if an index is availabe in the given Series
</div>

In [None]:
print("bananas" in groceries)
print("milk" in groceries)

## Accessing and Deleting Elements in Series

<div class="alert alert-block alert-info">
We can access elements in a dataframe using index labels or numerical indices inside square brackets [ ]. Negative indices can be used to access elements from the end of the Series. Pandas has also provided the two methods <b>loc</b> and <b>iloc</b> to aceess elements with label index and numerical index, respectively.
</div>

Let's see some examples:

### Access elements using index labels

In [None]:
print(groceries)
print()

# We access elements in Groceries using index labels:

# We use a single index label
print("groceries['eggs']:\n", groceries['eggs'])
print()

# we can access multiple index labels
print("groceries[['milk', 'bread']]:\n", groceries[['milk', 'bread']])
print()

# we use loc to access multiple index labels
print("groceries.loc['eggs', 'apples']:\n", groceries.loc[['eggs', 'apples']])
print()

# We access elements in Groceries using numerical indices:

# we use multiple numerical indices
print("groceries.[[0, 1]]:\n", groceries[[0, 1]])
print()

# We use a negative numerical index
print("groceries[[-1]]:\n", groceries[[-1]])
print()

# We use a single numerical index
print("groceries[0]:\n", groceries[0])
print()

# we use iloc to access multiple numerical indices
print("groceries.iloc[[2, 3]]:\n", groceries.iloc[[2, 3]])

### Changing elements of a Series

<div class="alert alert-block alert-info">
Pandas Series is mutable and we can change its elements.
</div>

Let's see an example:

In [None]:
print("Before changing:\n")
print(groceries)
print()

groceries['apples'] = 12

print("After changing:\n")
print(groceries)

### Deleting element of  a Series

<div class="alert alert-block alert-info">
    We can delete an element from a Series using <b>drop()</b> method.
</div>

Let's see an example:

<div class="alert alert-block alert-warning">
In Pandas Series, using <b>drop()</b> method will not remove an element from the original Series. For this, we will use an argument 'inplace = True'. This will delete an element from the original Series.
</div>

In [None]:
print("Before deleting:\n")
print(groceries)
print()

groceries.drop('eggs', inplace = True)

print("After deleting:\n")
print(groceries)

### Arithmetic Operations on Pandas Series

<div class="alert alert-block alert-info">
We can do arithmetic operations between Pandas Series and single numbers. We can also apply mathematical functions from Numpy and perform arithmetic operations on selected elements.
</div>

Let's see some examples:

In [None]:
veggies = pd.Series(data = [6, 18, 33], index = ["Tomatoes", "Carrots", "Potatoes"])
print(veggies)
print()

print("veggies + 2:\n", veggies + 2)
print()

print("veggies - 2:\n", veggies - 2)
print()

print("veggies * 2:\n", veggies * 2)
print()

print("veggies / 2:\n", veggies / 2)

In [None]:
print(veggies)
print()

print("np.exp(veggies):\n", np.exp(veggies))
print()

print("np.power(veggies, 2):\n", np.power(veggies, 2))
print()

print("np.sqrt(veggies):\n", np.sqrt(veggies))

In [None]:
print(veggies)
print()

print("veggies['Tomatoes'] + 10: ", veggies['Tomatoes'] + 10)
print()

print("veggies.iloc[2] - 5: ", veggies.iloc[2] - 5)
print()

print("veggies[['Carrots', 'Potatoes']] * 2: ", veggies[['Carrots', 'Potatoes']] * 2)
print()

print("veggies.loc[['Carrots', 'Potatoes']] / 2:\n", veggies[['Carrots', 'Potatoes']] * 2)

<div class="alert alert-block alert-info">
 We can also do arithmetic operations on Pandas Series of mixed data types provided that the arithmetic opeeration is defined for all data types in the Series, otherwise we will get an error.
</div>

In [None]:
my_list = ["Yes", "No", 100, 10]
indices = ['milk', 'bread', 'apples', 'eggs']

groceries = pd.Series(data = my_list, index = indices)
print(groceries)
print()

print("groceries * 2:\n", groceries * 2)

<a id = '3.'></a>
# 3. DataFrames

<div class="alert alert-block alert-info">
Pandas DataFrame is a two-dimentional data structure with labeled rows and columns, that can hold many data types. It is made up of many Pandas Series. A DataFrame is similar to an excel spredsheet. A Pandas DataFrame can be created manually or can be loaded from a file.
<br>  
<br>
<b>pandas.DataFrame()</b> method is used to create a DataFrame manually. We will first create a dictionary and then pass it to the <b>pandas.DataFrame()</b> as an attribute.
</div>

<br>
<img src = "https://files.realpython.com/media/A-Guide-to-Pandas-Dataframes_Watermarked.7330c8fd51bb.jpg" width = "600"/>
<br>
<center><str><a target="_blank" href="https://files.realpython.com/media/A-Guide-to-Pandas-Dataframes_Watermarked.7330c8fd51bb.jpg">Image Source</a></str></center>


Let's see an example:

In [None]:
items = {
    "Ali" : pd.Series(data = [101, 34, 78], index = ['cars', 'bikes', 'pants']),
    "Bilal" : pd.Series(data = [87, 11, 7, 55], index = ['books', 'pencils', 'apples', 'peanuts'])
}

df = pd.DataFrame(items)
print(type(df))
df

<div class="alert alert-block alert-info">
In above example, we see that the row labels of 'df' DataFrame are ordered alphabetically not in the order we gave in the dictionary. You can also see that there are some <b>NaN</b> values. But what are they?
<br>
    <b>NaN</b> stands for <i>Not a Number</i>, and is Pandas way of indicating the missing values.

We can also pass lists as keys insted of Pandas Series:
</div>

In [None]:
data = {"Apples" : [11, 22, 33, 101, 302],
       "Mangoes" : [99, 77, 44, 45, 100]}

fruits_data = pd.DataFrame(data)

fruits_data

### Creating a DataFrame using a dictionary of lists, and custom row-indexes (labels)

In [None]:
data = {"Integers" : [1, 2, 3],
       "Floats" : [3.4, 0.1, 5.6],
       "Strings" : ["A", "B", "C"]}

df = pd.DataFrame(data, index = ["DataType 1", "DataType 2", "Datatype 3"])

df

### Creating a DataFrame using a list of dictionaries

In [None]:
items = [{'socks' : 10, 'shoes' : 5, 'watches' : 12},
        {'bikes' : 2, 'cycles' : 5, 'cars' : 3, 'trains' : 8}]

data = pd.DataFrame(items, index = ["collection 1", "collection 2"])
data

## Attributes of a DataFrame

Let's see some attributes of a DataFrame and print some information about fruits_data:

In [None]:
print("fruits_data.shape: ", fruits_data.shape)
print("fruits_data.ndim: ", fruits_data.ndim)
print("fruits_data.size: ", fruits_data.size)
print()
print("fruits_data.values:\n", fruits_data.values)
print()
print("fruits_data.index: ", fruits_data.index)
print("fruits_data.columns: ", fruits_data.columns)

<div class="alert alert-block alert-info">
We can also create a DataFrame of rows and columns of our own choice:
</div>

In [None]:
items = {
    "Ali" : pd.Series(data = [101, 34, 78], index = ['cars', 'bikes', 'pants']),
    "Bilal" : pd.Series(data = [87, 11, 7, 55], index = ['books', 'pencils', 'apples', 'peanuts'])
}

df = pd.DataFrame(items, columns = ["Bilal"], index = ['apples', 'bikes', 'books'])
df

## Accessing and Adding elements in DataFrame

<div class="alert alert-block alert-info">
Like Pandas Series, we can also access elements in a DataFrame by using index labels and column names. <b>loc[ ]</b> and <b>iloc[ ]</b> are also used for accessing elements
</div>

Let's have some examples:

In [None]:
items = [{'socks' : 10, 'shoes' : 5, 'watches' : 12},
        {'bikes' : 2, 'cycles' : 5, 'cars' : 3, 'trains' : 8}]

data = pd.DataFrame(items, index = ["collection 1", "collection 2"])
data

In [None]:
data[['bikes']]

In [None]:
data[['bikes', 'socks', 'cycles']]

In [None]:
data.loc[['collection 1']]

<div class="alert alert-block alert-warning">
If we want to access individual elements in a DataFrame, the labels should always be provided with the column label first.
</div>

In [None]:
data['cars']['collection 2']

## Adding a column to an existing DataFrame

In [None]:
data['jets'] = [9, 4]

data

### Add a new column based on the arithmetic operation between existing columns of a DataFrame

<div class="alert alert-block alert-info">
We can also add new columns to our DataFrame by using arithmetic operations between other columns in our DataFrame.
</div>

Let's see an example:

In [None]:
data["foot wear"] = data["socks"] + data["shoes"]
data

### Adding new column at a specific location

<div class="alert alert-block alert-info">
We can also add a new column anywhere we want by using <b>DataFrame.insert(loc, label, data)</b> method.
</div>

For example:

In [None]:
data.insert(2, 'drinks', [3, 4])
data

## Deleting column from a DataFrame

<div class="alert alert-block alert-danger">
    We can delete columns with <b>DataFrame.pop()</b> and <b>DataFrame.drop()</b> method.
</div>

For example:

In [None]:
data.pop("cars")
data

### Deleting multiple columns from a DataFrame

<div class="alert alert-block alert-danger">
    With <b>DataFrame.drop()</b> method, we can delete multiple columns from a DataFrame. Here we will specify <i>aixs = 1</i> in order to delete columns.
</div>

In [None]:
data = data.drop(['socks', 'bikes'], axis = 1)
data

### Deleting rows from a DataFrame

<div class="alert alert-block alert-danger">
    We can also delete rows from a DataFrame with the help of <b>DataFrame.drop()</b>. Here we will specify <i>aixs = 0</i> in order to delete rows.
</div>

In [None]:
data = data.drop(["collection 2"], axis = 0)
data

### Renaming the column name in a DataFrame

<div class="alert alert-block alert-info">
    We can change the name of columns with <b>rename()</b> method by giving <i>columns</i> argument in <b>rename()</b> method.
</div>

In [None]:
data = data.rename(columns = {"jets" : "planes"})
data

### Renaming row label in a DataFrame

<div class="alert alert-block alert-info">
    We can also rename a row label by using <b>rename()</b> method and giving <i>index</i> as an argument.
</div>

In [None]:
data = data.rename(index = {"collection 1" : "row 1"})
data

### Setting column values as row index

<div class="alert alert-block alert-info">
    We can also set a specific column to the row index by using <b>set_index()</b> method.
</div>

In [None]:
data = data.set_index("shoes")
data

<a id = '4.'></a>
# 4. Dealing with NaN

### Counting NaN down the column

<div class="alert alert-block alert-info">
    Pandas provides some methods and attributes to count NaN values. <b>DataFrame.isnull()</b> method will return True if a value in DataFrame has NaN value other wise False. <b>DataFrame.isnull().sum()</b> will return the number of NaN in each column & <b>DataFrame.isnull().sum().sum()</b> will return the number of all NaN values in that DataFrame.
</div>

Let's have an example:

In [None]:
data = [{'bikes': 20, 'pants': 30, 'watches': 35, 'shirts': 15, 'shoes':8, 'suits':45},
{'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5, 'shirts': 2, 'shoes':5, 'suits':7},
{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes':10}]

df = pd.DataFrame(data, index = ["shop 1", "shop 2", "shop 3"])
df

In [None]:
df.isnull()

In [None]:
df.isnull().sum()

In [None]:
print("Total number of NaN values = ", df.isnull().sum().sum())

<div class="alert alert-block alert-warning">
    We can also count the number of Non-NaN values in a DataFrame using <b>DataFrame.count()</b> method.
</div>

In [None]:
df.count()

In [None]:
print("Total number of Non-NaN values = ", df.count().sum())

## Eliminating NaN Values

<div class="alert alert-block alert-warning">
As we've counted the number of NaN values, now we can either drop or replace them.
</div>

### Droping rows having NaN values

<div class="alert alert-block alert-info">
    We can remove NaN values with the help of <b>dropna()</b> method. In <b>dropna()</b> method, when <i>axis = 0</i>, it will remove rows having NaN values & when <i>axis = 1</i>, it will remove columns having NaN values. And <i>inplace = True</i> will remove rows form original DataFrame.
</div>

In [None]:
# Drop rows having NaN values
df.dropna(axis = 0, inplace = True)
df

In [None]:
data = [{'bikes': 20, 'pants': 30, 'watches': 35, 'shirts': 15, 'shoes':8, 'suits':45},
{'watches': 10, 'glasses': 50, 'bikes': 15, 'pants':5, 'shirts': 2, 'shoes':5, 'suits':7},
{'bikes': 20, 'pants': 30, 'watches': 35, 'glasses': 4, 'shoes':10}]

df = pd.DataFrame(data, index = ["shop 1", "shop 2", "shop 3"])
df

In [None]:
# Drop columns having NaN values
dummy_df = df.dropna(axis = 1)
dummy_df

## Replacing NaN values

<div class="alert alert-block alert-info">
We can also replace NaN values with any value we want with the help of <b>fillna()</b> method.
</div>

In [None]:
# Replacing all NaN values with 0
dummy_df = df.fillna(0)
dummy_df

<div class="alert alert-block alert-info">
We can also replace NaN values with the previous values in the DataFrame. This is called forward filling. In this case, we will pass <i>method = 'ffill'</i> & <i>aixs</i> as an argument in <b>fillna()</b> method.</div>

In [None]:
# forward filling down axis = 0
dummy_df = df.fillna(method = 'ffill', axis = 0)
dummy_df

<div class="alert alert-block alert-warning">
    Notice that the NaN value in <i>shop 1</i> did not get replaced with the previous value because there is no previous value in <i>glasses</i> column.
</div>

In [None]:
# forward filling arcoss axis = 1
dummy_df = df.fillna(method = 'ffill', axis = 1)
dummy_df

<div class="alert alert-block alert-warning">
 Here all the NaN values have been replaced with the previous row values.
</div>

<div class="alert alert-block alert-info">
    We can also do backward filling by using <i>method = 'backfill'</i> in <b>fillna()</b> method.
</div>

In [None]:
# backward filling down axis = 0
dummy_df = df.fillna(method = 'backfill', axis = 0)
dummy_df

<div class="alert alert-block alert-warning">
Here all NaN values have been replaced with their previous values in their columns except two columns in <i>shop 3</i> because they have no previous values in their columns.
</div>

In [None]:
# backward filling across axis = 1
dummy_df = df.fillna(method = 'backfill', axis = 1)
dummy_df

<a id = '5.'></a>
# 5. Loading Data into DataFrame

Now we will load data from a csv file into Pandas DataFrame using **Pandas.read_csv()**. In this notebook we will be using [Google Play Store](https://www.kaggle.com/datasets/whenamancodes/play-store-apps) dataset.

In [None]:
data = pd.read_csv("/kaggle/input/play-store-apps/googleplaystore.csv")
data

<div class="alert alert-block alert-info">
    We can print the first and last 5 rows of a dataset by using <b>DataFrame.head()</b> and <b>DataFrame.tail()</b> mehtods, respectively. We can also optionally use <b>DataFrame.head(N)</b> or <b>DataFrame.tail(N)</b> to display first and last <i>N</i> rows of a data, respectively.
</div>

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.head(10)

<div class="alert alert-block alert-info">
    We can also get the shape of data by using <b>shape</b> attribute. This will return a tuple where value at index 0 will be number of rows and value at index 1 will be number of columns.</div>

In [None]:
data.shape

In [None]:
print("Rows: ", data.shape[0])
print("Columns: ", data.shape[1])

### Dealing with NaN values

<div class="alert alert-block alert-info">
Let's do a quick check to see whether we have any NaN values in our dataset. For this we will be using <b>DataFrame.isnull()</b> method followed by <b>any()</b> mehtod. This will return a boolean for each column label.</div>

In [None]:
data.isnull().any()

<div class="alert alert-block alert-warning">
As our dataset contains NaN values, so we will be removing rows having NaN.</div>

In [None]:
data.dropna(axis = 0, inplace = True)
data.head()

In [None]:
data['Current Ver'].value_counts()

In [None]:
data.isnull().any()

### Descriptive Statistics of the DataFrame

<div class="alert alert-block alert-info">
    We can also get the descriptive statistics of the DataFrame using <b>DataFrame.describe()</b> method.
</div>

In [None]:
data.describe()

<div class="alert alert-block alert-info">
    We can apply <b>describe()</b> method on a single column as well.</div>

In [None]:
data["Rating"].describe()

<div class="alert alert-block alert-info">
We can also look at one of many statistical functions that Pandas provides.
</div>

Let's see some examples:

In [None]:
data["Rating"].min()

In [None]:
data["Rating"].max()

In [None]:
data["Rating"].mean()

<div class="alert alert-block alert-info">
    We can also get the information about the DataFrame using <b>info()</b> method.</div>

In [None]:
data.info()

## Sorting rows of the DataFrame

<div class="alert alert-block alert-info">
    We can sort the rows of a DataFrame using <b>sort_values()</b> method. In some cases where we have the same value (this is common if we sort on a categorical variable), we may wish to break the ties by sorting on another column. We can sort on multiple columns in this way by passing a list of multiple column names. <b>sort_values()</b> method sort values in ascending order by default but we can sort values in descending order by passing argument <i>ascesding = False</i>.
</div>

Let's see some examples:

In [None]:
# sorting in ascending order
data_by_content =  data.sort_values("Content Rating")
data_by_content.head()

In [None]:
# sorting in descending order
data_by_content =  data.sort_values("Content Rating", ascending = False)
data_by_content.head()

In [None]:
# sorting by multiple columns
sorted_data = data.sort_values(["Content Rating", "Genres"])
sorted_data.head()

### Subsetting Columns of the DataFame

<div class="alert alert-block alert-info">
We can subset a single or multiple columns by using square brackets "[ ]".
</div>

Let's have some examples:

In [None]:
sub_data = data[['Category']]
sub_data.head()

In [None]:
# Creating a DataFrame that only contains 'Current Ver' & 'Android Ver'
ver_data = data[["Current Ver", "Android Ver"]]
ver_data.head()

### Subsetting or filtering rows of the DataFrame

<div class="alert alert-block alert-info">
We can also filter rows of the DataFrame that match some criteria. There are many ways to subset a DataFrame, perhaps the most common is to use relational operators to return True or False for each row, then pass that inside square brackets. We can also filter for multiple conditions at once by using the & operator.
</div>

Let's have some examples:

In [None]:
# filtering rows have 4.0 or above Rating
rating_data = data[data["Rating"] >= 4.0]
rating_data.head()

In [None]:
rating_type_data = data[(data["Rating"] >= 4.5) & (data["Type"] == "Paid")]
rating_type_data.head()

<div class="alert alert-block alert-info">
    Similarly, we can subset data based on categorical variables with <b>isin()</b> mehtod. Although, this can also be done with <i>or</i> operator '|' but this can get tedious. Instead of writing every condition separately with '|', <b>isin()</b> method does the same job with one line of code.
</div>

In [None]:
categories_list = ["BEAUTY", "EVENTS", "FAMILY", "COMICS"]
condition = data["Category"].isin(categories_list)
data_by_categories = data[condition]
data_by_categories.head()

### Dropping Duplicates

<div class="alert alert-block alert-info">
Dropping duplicates is an important part of our analysis because we don't want to count the same thing multiple times. This can be done by using <b>drop_duplicates()</b> method. We can also drop duplicates from multiple columns by passing an argument <i>subset</i> with the list of columns.</div>

In [None]:
non_dup_app = data.drop_duplicates(subset = ["App"])
non_dup_app

In [None]:
print("Rows before dropping duplicates: ", data.shape[0])
print("Rows after dropping duplicates: ", non_dup_app.shape[0])

### Counting Categorical Variables

<div class="alert alert-block alert-info">
    We can count the categorical variables in a dataset. This can be done with <b>value_counts()</b> mehtod. We can also get the proportion of each category in dataset by passing <i>normalize = True</i> argument. We can also sort the results by passing <i>sort = True</i> argument.
</div>

In [None]:
app_counts = non_dup_app["Category"].value_counts(sort = True)
app_counts

In [None]:
app_counts = non_dup_app["Category"].value_counts(normalize = True, sort = True)
app_counts

### Splitting data into Groups

<div class="alert alert-block alert-info">
We can also split data into groups and apply some functions on these groups. This can be done with <b>groupby()</b> mehthod.
</div>

Let's see some examples:

In [None]:
# grouping data by "Type" and then calculating mean for each Type
grouped_type  = data.groupby("Type")["Rating"].mean()
grouped_type

In [None]:
grouped_category = data.groupby("Category")["Rating"].mean()
grouped_category

### Multiple Grouped Summaries

<div class="alert alert-block alert-info">
    We can also calculate multiple grouped summaries statistics using <b>agg()</b> method. <b>agg()</b> method apply a function or a list of functions on a Series or a column of a DataFrame.</div>

In [None]:
grouped_category = data.groupby("Category")["Rating"].agg([np.mean, np.median, np.min, np.max])
grouped_category

### Setting and Removing index

<div class="alert alert-block alert-info">
As discussed earlier, we can set and remove any column as an index of a dataframe.
</div>

Let's see an example:

In [None]:
dummy_data = data.set_index("App")
dummy_data.head()

In [None]:
reset_data = dummy_data.reset_index()
reset_data.head()

<div class="alert alert-block alert-warning">
We can also reset the index and drop its contents.</div>

In [None]:
# this will reset the index and reomve the App column
dummy_data_drop = dummy_data.reset_index(drop = True)
dummy_data_drop

## Slicing and Indexing

<div class="alert alert-block alert-info">
    Now let's see subsetting with <b>loc[ ]</b> and <b>iloc[ ]</b> on our dataset.
</div>

In [None]:
dummy_data.loc["Google Play Games", "Type"]

In [None]:
dummy_data.loc["Sketch - Draw & Paint" : "Infinite Painter", "Installs" : "Current Ver"]

In [None]:
dummy_data.iloc[3:8, 4:11]

In [None]:
dummy_data.iloc[3:8, 4:]

<a id = '6.'></a>
# 6. Visualizing DataFrames

<div class="alert alert-block alert-info">
We can also visualize the content of dataframes using different methods. We can create plots with <b>plot()</b> method, then we will specify the kind of plot by passing <i>kind</i> argument.</div>

Let's see some examples:

Let's first convert:
   * _Everyone_ to **0**
   * _Mature 17+_ to **1**
   * _Teen_ to **2**
   *  _Everyone_ 10+ to **3**
   * _Adults only_ 18+ to **4**
   * _Unrated_ to **5**
   
in "Content Rating" column.

In [None]:
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Everyone',0)
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Mature 17+',1)
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Teen',2)
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Everyone 10+',3)
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Adults only 18+',4)
dummy_data['Content Rating']=dummy_data['Content Rating'].replace('Unrated',5)

In [None]:
dummy_data.plot(x = "Content Rating", y = "Rating", kind = "scatter");

In [None]:
data["Content Rating"].value_counts().plot(kind = "bar");

In [None]:
category_list = ["Everyone", "Teen", "Adults only 18+"]
condition = data["Content Rating"].isin(category_list)
new_data = data[condition]
new_data.head()

In [None]:
new_data.groupby("Content Rating")['Rating'].mean().plot(kind = "bar");

In [None]:
category_list = ["Action", "Racing", "Sports", "Events"]
condition = data["Genres"].isin(category_list)
new_data = data[condition]
new_data.head()

In [None]:
new_data.groupby("Genres")['Rating'].mean().plot(kind = "bar");

<div class="alert alert-block alert-success">
Alright! That's the end of this tutorial. If you like it then please upvote this notebook.
</div>