## Load different sheets from PlayStore Apps dataset


#### **Instructions:**

Using the `playstore.xlsx` Excel file from the given data_url and:

* Save in a `playstore_df` variable the `Google_playstore` sheet. Use the first column as index.
* Save in a `content_id_df` variable the `Content_ID` sheet. Use `Content_ID` as index.

In [8]:
import pandas as pd

In [14]:
data_url = 'https://github.com/ine-rmotr-projects/project-files/files/4086772/playstore.xlsx'

Reading the the sheet with the name `Google_playstore` and using the first column as index, we can use the `pd.read_excel()` function from the `pandas` library. The same applies to the `Content_ID` sheet, but we will use `Content_ID` as index.

In [28]:
playstore_df = pd.read_excel(data_url, sheet_name='Google_playstore')

In [29]:
playstore_df.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Installs,Type,Price,Content_ID,Genres,Last_Updated
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,"10,000+",Free,0,101,Art & Design,"January 7, 2018"
1,1,Coloring book moana,ART_AND_DESIGN,3.9,"500,000+",Free,0,101,Art & Design;Pretend Play,"January 15, 2018"
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,"5,000,000+",Free,0,101,Art & Design,"August 1, 2018"
3,3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,"50,000,000+",Free,0,102,Art & Design,"June 8, 2018"
4,4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,"100,000+",Free,0,101,Art & Design;Creativity,"June 20, 2018"


In [21]:
playstore_df.dtypes

Unnamed: 0        int64
App              object
Category         object
Rating          float64
Installs         object
Type             object
Price            object
Content_ID        int64
Genres           object
Last_Updated     object
dtype: object

Parsing the date time columns in the `Google_playstore` sheet can be done using the `parse_dates` parameter of the `pd.read_excel()` function. We will specify the columns that we want to parse as dates and we are setting the `index_col` parameter to `0` to use the first column as index of the DataFrame. The `sheet_name` parameter is used to specify the name of the sheet we want to read from the Excel file.

In [33]:

playstore_df = pd.read_excel(data_url, sheet_name='Google_playstore', parse_dates=['Last_Updated'], index_col=0)

In [34]:
playstore_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 250 entries, 0 to 249
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   App           250 non-null    object        
 1   Category      250 non-null    object        
 2   Rating        239 non-null    float64       
 3   Installs      250 non-null    object        
 4   Type          250 non-null    object        
 5   Price         250 non-null    object        
 6   Content_ID    250 non-null    int64         
 7   Genres        250 non-null    object        
 8   Last_Updated  250 non-null    datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(1), object(6)
memory usage: 19.5+ KB


In [35]:
playstore_df.head()

Unnamed: 0,App,Category,Rating,Installs,Type,Price,Content_ID,Genres,Last_Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,"10,000+",Free,0,101,Art & Design,2018-01-07
1,Coloring book moana,ART_AND_DESIGN,3.9,"500,000+",Free,0,101,Art & Design;Pretend Play,2018-01-15
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,"5,000,000+",Free,0,101,Art & Design,2018-08-01
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,"50,000,000+",Free,0,102,Art & Design,2018-06-08
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,"100,000+",Free,0,101,Art & Design;Creativity,2018-06-20


In [37]:
content_id_df = pd.read_excel(data_url, sheet_name='Content_ID').set_index('Content_ID')

In [38]:
content_id_df.head()

Unnamed: 0_level_0,Content_Rating
Content_ID,Unnamed: 1_level_1
101,Everyone
101,Everyone
101,Everyone
102,Teen
101,Everyone


---

### With Excel file Class

In [39]:
file = pd.ExcelFile(data_url)

We can check the sheets in the Excel file using the `sheet_names` method of the `ExcelFile` class. This will return a list of all the sheet names in the Excel file. We can then use this list to read the specific sheets we want into separate DataFrames.

In [40]:
file.sheet_names

['Google_playstore', 'Content_ID']

In [41]:
playstore_df = file.parse('Google_playstore', parse_dates=['Last_Updated'], index_col=0)

In [42]:
playstore_df.head()

Unnamed: 0,App,Category,Rating,Installs,Type,Price,Content_ID,Genres,Last_Updated
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,"10,000+",Free,0,101,Art & Design,2018-01-07
1,Coloring book moana,ART_AND_DESIGN,3.9,"500,000+",Free,0,101,Art & Design;Pretend Play,2018-01-15
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,"5,000,000+",Free,0,101,Art & Design,2018-08-01
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,"50,000,000+",Free,0,102,Art & Design,2018-06-08
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,"100,000+",Free,0,101,Art & Design;Creativity,2018-06-20


In [43]:
content_id_df = file.parse('Content_ID').set_index('Content_ID')   

In [44]:
content_id_df.head()

Unnamed: 0_level_0,Content_Rating
Content_ID,Unnamed: 1_level_1
101,Everyone
101,Everyone
101,Everyone
102,Teen
101,Everyone
