<a href="https://colab.research.google.com/github/ormastroni/fundamentos-python/blob/main/aula12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fundamentos de Desenvolvimento Python

## Prof. Andre Victor

### DataFrames

DataFrames são estruturas de dados providas pelo biblioteca pandas que oferecem a abstração do modelo relacional de dados, isto é, os dados são organizados em linhas e colunas e podem ser acessados e indexados pelo seu posicionamento e por suas colunas, similar ao que ocorre com as séries.

A grande diferença dos dataframes para as séries é que os dataframes são estruturas multidimensionais. Ambas são indexadas por uma chave/índice. Entretanto, a chave em uma série é mapeada para um valor atômico, ao passo que no dataframe a chave é mapeada para múltiplos valores, um de cada Série diferente. Na prática, um dataframe é uma estrutura que colapsa várias séries numa única estrutura.

<div align='center'>
<img src='https://drive.google.com/uc?id=184DHjqiuChmbWtLiFR6ETby4gQqs1eAf' width=50% height=50%>
</div>

Fonte da imagem: https://towardsdatascience.com/how-to-master-pandas-for-data-science-b8ab0a9b1042



In [1]:
import pandas as pd

### Criação de dataframes

Dicionários podem ser criados a partir de múltiplas formas. A forma mais usual é partir da execução de consultas SQl em tabelas de banco de dados ou através de carga de arquivos externos, como os arquivos CSV.

Entretanto, é possível criar dataframes explicitamente através de listas, tuplas, dicionários e Séries. Vamos ver algumas abordagens por aqui.

Por exemplo, um dataframe pode ser criado a partir de uma lista de tuplas

In [2]:
lista = [
    ('andre', '1234', 24),
    ('joao', '4567', 33),
    ('cecilia', '7890', 18),
    ('maria', '2345', 22)
]

In [3]:
df_pessoas = pd.DataFrame(lista)
df_pessoas

Unnamed: 0,0,1,2
0,andre,1234,24
1,joao,4567,33
2,cecilia,7890,18
3,maria,2345,22


In [4]:
print(df_pessoas)

         0     1   2
0    andre  1234  24
1     joao  4567  33
2  cecilia  7890  18
3    maria  2345  22


In [5]:
type(df_pessoas)

pandas.core.frame.DataFrame

In [6]:
df_pessoas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       4 non-null      object
 1   1       4 non-null      object
 2   2       4 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 224.0+ bytes


DataFrames são estruturas indexáveis similares à listas, dicionários e séries

In [7]:
df_pessoas.loc[0,1]

'1234'

In [8]:
df_pessoas.loc[1]

0    joao
1    4567
2      33
Name: 1, dtype: object

In [9]:
df_pessoas.loc[0:2, 1:2]

Unnamed: 0,1,2
0,1234,24
1,4567,33
2,7890,18


In [10]:
df_pessoas.loc[1:2,1:2]

Unnamed: 0,1,2
1,4567,33
2,7890,18


Porém a manipulação de dataframes é mais intuitiva quando fazemos acesso aos dados a partir do nome atribuído às colunas, e não por índices de posicionamento

In [11]:
df_pessoas.columns = ['Nome', 'Matr', 'Idade']

In [12]:
df_pessoas

Unnamed: 0,Nome,Matr,Idade
0,andre,1234,24
1,joao,4567,33
2,cecilia,7890,18
3,maria,2345,22


In [13]:
df_pessoas['Nome']

0      andre
1       joao
2    cecilia
3      maria
Name: Nome, dtype: object

In [14]:
df_pessoas[['Nome']]

Unnamed: 0,Nome
0,andre
1,joao
2,cecilia
3,maria


In [15]:
df_pessoas[['Matr', 'Idade']]

Unnamed: 0,Matr,Idade
0,1234,24
1,4567,33
2,7890,18
3,2345,22


Dataframes também podem ser criados a partir de dicionário de tuplas ou de listas

In [16]:
nomes = ['andre', 'joao', 'cecilia', 'maria']
idades = [24, 33, 18, 22]
matric = ['1234', '4567', '7890', '2345']

In [17]:
tupla_nomes = tuple(nomes)
tupla_idades = tuple(idades)
tupla_matric = tuple(matric)

In [18]:
tupla_nomes

('andre', 'joao', 'cecilia', 'maria')

In [19]:
pd.DataFrame({'nomes': nomes, 'idades': idades, 'matric': matric})

Unnamed: 0,nomes,idades,matric
0,andre,24,1234
1,joao,33,4567
2,cecilia,18,7890
3,maria,22,2345


In [20]:
pd.DataFrame({'nomes': tupla_nomes, 'idades': tupla_idades, 'matric': tupla_matric})

Unnamed: 0,nomes,idades,matric
0,andre,24,1234
1,joao,33,4567
2,cecilia,18,7890
3,maria,22,2345


### Acesso aos dados do Dataframe

Similar aos métodos de acesso de Series.

Vamos trabalhar novamente com o dataframe da Play Store

In [21]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [22]:
import os
os.chdir("/content/drive/My Drive/cursos/fundamentos python/shared/datasets")

In [23]:
df_apps = pd.read_csv('googleplaystore.csv')
df_apps

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [24]:
df_apps.dropna(subset=['Rating'])

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10834,FR Calculator,FAMILY,4.0,7,2.6M,500+,Free,0,Everyone,Education,"June 18, 2017",1.0.0,4.1 and up
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [25]:
df_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [26]:
df_apps['Reviews'] = df_apps['Reviews'].astype(int)

ValueError: ignored

In [27]:
df_apps[df_apps['Reviews'] == '3.0M']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


In [28]:
df_apps.loc[10472]

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

In [29]:
df_apps.loc[10472, 'Reviews'] = 3000000

In [30]:
df_apps.loc[10472]

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                           3000000
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

In [31]:
df_apps['Reviews'] = df_apps['Reviews'].astype(int)

In [32]:
df_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  int64  
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), int64(1), object(11)
memory usage: 1.1+ MB


In [33]:
df_amostra = df_apps.sample(n=3)

In [34]:
df_amostra

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
3969,Servers Ultimate Pack B,TOOLS,4.3,668,13M,"50,000+",Free,0,Everyone,Tools,"March 31, 2016",2.1.8,2.1 and up
8339,DF Wall Plus – Droid Firewall,TOOLS,,9,6.3M,500+,Free,0,Everyone,Tools,"August 20, 2017",1.0,4.0.3 and up
8812,DS cloud,TOOLS,3.2,4908,38M,"500,000+",Free,0,Everyone,Tools,"May 23, 2018",2.8.0,4.0 and up


In [35]:
df_amostra.loc[8339, 'Rating']

nan

In [36]:
df_amostra.iloc[0,2]

4.3

In [37]:
df_amostra.iloc[0, 2]

4.3

In [None]:
df_amostra.loc[7014, 'Rating']

3.7

In [None]:
df_apps.loc[8250:8255, 'Category':'Reviews']

Unnamed: 0,Category,Rating,Reviews
8250,FAMILY,4.3,43090
8251,FAMILY,4.2,2557
8252,GAME,4.5,937
8253,FAMILY,4.3,139545
8254,GAME,4.3,22333
8255,PERSONALIZATION,4.2,249


In [38]:
df_apps.loc[[8250, 10472], ['Category', 'Rating', 'Size']]

Unnamed: 0,Category,Rating,Size
8250,FAMILY,4.3,95M
10472,1.9,19.0,"1,000+"


In [39]:
df_apps.at[8250, 'Reviews']

43090

In [None]:
df_apps.loc[8250:8255, 'Category':'Reviews'].iat[0,2]

43090

### Tratamento de dados

In [46]:
def trata_size(valor):
  result = valor.replace('M', '')
  result = result.replace('k', '')
  return result

In [47]:
trata_size("2.0M")

'2.0'

In [48]:
trata_size('M3.4M5')

'3.45'

In [49]:
df_apps['Size_N'] = df_apps['Size'].apply(trata_size)

In [50]:
df_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  int64  
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
 13  Size_N          10841 non-null  object 
dtypes: float64(1), int64(1), object(12)
memory usage: 1.2+ MB


In [51]:
df_apps.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_N
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8


In [53]:
df_apps['Size_N']

0                        19
1                        14
2                       8.7
3                        25
4                       2.8
                ...        
10836                    53
10837                   3.6
10838                   9.5
10839    Varies with device
10840                    19
Name: Size_N, Length: 10841, dtype: object

In [55]:
df_apps.tail()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_N
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up,53
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up,3.6
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up,9.5
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device,Varies with device
10840,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device,19


In [61]:
df_apps_com_tam = df_apps[df_apps['Size_N'] != 'Varies with device']

In [65]:
df_apps_com_tam.shape

(9146, 14)

In [59]:
df_apps_com_tam.tail()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_N
10835,FR Forms,BUSINESS,,0,9.6M,10+,Free,0,Everyone,Business,"September 29, 2016",1.1.5,4.0 and up,9.6
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up,53.0
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up,3.6
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up,9.5
10840,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device,19.0


In [66]:
df_apps_com_tam.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9146 entries, 0 to 10840
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             9146 non-null   object 
 1   Category        9146 non-null   object 
 2   Rating          7730 non-null   float64
 3   Reviews         9146 non-null   int64  
 4   Size            9146 non-null   object 
 5   Installs        9146 non-null   object 
 6   Type            9146 non-null   object 
 7   Price           9146 non-null   object 
 8   Content Rating  9145 non-null   object 
 9   Genres          9146 non-null   object 
 10  Last Updated    9146 non-null   object 
 11  Current Ver     9138 non-null   object 
 12  Android Ver     9143 non-null   object 
 13  Size_N          9146 non-null   object 
dtypes: float64(1), int64(1), object(12)
memory usage: 1.0+ MB


In [67]:
df_apps_com_tam['Size_N'] = df_apps_com_tam['Size_N'].astype(float)

ValueError: ignored

In [68]:
df_apps['Size'].value_counts()

Varies with device    1695
11M                    198
12M                    196
14M                    194
13M                    191
                      ... 
582k                     1
34k                      1
219k                     1
624k                     1
916k                     1
Name: Size, Length: 462, dtype: int64

### Agrupamento de dados

Dados podem ser agrupados por um ou mais atributos. Ao realizar um agrupamento, o atributo de grupo torna-se a chave do dataframe resultante

In [72]:
df_apps.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Size_N
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,19.0
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,14.0
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,8.7
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up,25.0
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up,2.8
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up,5.6
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up,19.0
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up,29.0
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up,33.0
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up,3.1


In [75]:
df_apps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  int64  
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
 13  Size_N          10841 non-null  object 
dtypes: float64(1), int64(1), object(12)
memory usage: 1.2+ MB


In [73]:
df_apps_tipo = df_apps.groupby(['Category']).sum()

In [74]:
df_apps_tipo

Unnamed: 0_level_0,Rating,Reviews
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
1.9,19.0,3000000
ART_AND_DESIGN,270.2,1714440
AUTO_AND_VEHICLES,305.9,1163666
BEAUTY,179.7,396240
BOOKS_AND_REFERENCE,773.6,21959069
BUSINESS,1248.8,13954552
COMICS,241.0,3383276
COMMUNICATION,1364.0,815462260
DATING,774.3,7291278
EDUCATION,680.3,39595786


In [None]:
serie_tipo = df_apps.groupby(['Category']).sum()['Reviews']

In [None]:
serie_tipo

Category
1.9                       3000000
ART_AND_DESIGN            1714440
AUTO_AND_VEHICLES         1163666
BEAUTY                     396240
BOOKS_AND_REFERENCE      21959069
BUSINESS                 13954552
COMICS                    3383276
COMMUNICATION           815462260
DATING                    7291278
EDUCATION                39595786
ENTERTAINMENT            59178154
EVENTS                     161018
FAMILY                  410226330
FINANCE                  17550728
FOOD_AND_DRINK            8883330
GAME                   1585422349
HEALTH_AND_FITNESS       37893743
HOUSE_AND_HOME            3976385
LIBRARIES_AND_DEMO        1037118
LIFESTYLE                12882784
MAPS_AND_NAVIGATION      30659254
MEDICAL                   1585975
NEWS_AND_MAGAZINES       54400863
PARENTING                  958331
PERSONALIZATION          89346140
PHOTOGRAPHY             213516650
PRODUCTIVITY            114116975
SHOPPING                115041222
SOCIAL                  621241422
SPORT

In [None]:
serie_rating = df_apps.groupby(['Category']).mean()['Rating']

In [None]:
serie_rating

Category
1.9                    19.000000
ART_AND_DESIGN          4.358065
AUTO_AND_VEHICLES       4.190411
BEAUTY                  4.278571
BOOKS_AND_REFERENCE     4.346067
BUSINESS                4.121452
COMICS                  4.155172
COMMUNICATION           4.158537
DATING                  3.970769
EDUCATION               4.389032
ENTERTAINMENT           4.126174
EVENTS                  4.435556
FAMILY                  4.192272
FINANCE                 4.131889
FOOD_AND_DRINK          4.166972
GAME                    4.286326
HEALTH_AND_FITNESS      4.277104
HOUSE_AND_HOME          4.197368
LIBRARIES_AND_DEMO      4.178462
LIFESTYLE               4.094904
MAPS_AND_NAVIGATION     4.051613
MEDICAL                 4.189143
NEWS_AND_MAGAZINES      4.132189
PARENTING               4.300000
PERSONALIZATION         4.335987
PHOTOGRAPHY             4.192114
PRODUCTIVITY            4.211396
SHOPPING                4.259664
SOCIAL                  4.255598
SPORTS                  4.223511
T