# Data Analysis with Pandas and Python 

In [16]:
# Importing Pandas for Data Analysis
import pandas as pd
from datetime import datetime
from pytz import utc


# Creating a DataFrame with Pandas
reviews = pd.read_csv(r"udemy_reviews.csv", parse_dates=['Timestamp'])


# Turning Data into information

### The goal of Data Analysis is turning data into valuable insight to drive informed and clever decitions
### For this porpose let's set a list of relevant questions, that drives conclusions based on our present data


### 1- which is the average rating of the courses?

In [3]:
reviews['Rating'].mean()

4.442155555555556

### 2- Getting the rating for a particular course

In [7]:
# Getting the name of the desired course to mesure
reviews['Course Name'].unique()

array(['The Python Mega Course: Build 10 Real World Applications',
       'The Complete Python Course: Build 10 Professional OOP Apps',
       '100 Python Exercises I: Evaluate and Improve Your Skills',
       'Interactive Data Visualization with Python and Bokeh',
       'Python for Beginners with Examples',
       'Data Processing with Python',
       '100 Python Exercises II: Evaluate and Improve Your Skills',
       'Learn GIS in One Hour'], dtype=object)

In [14]:
# Getting the rating for the selectede course
py_begginers = reviews[reviews['Course Name'] == 'Python for Beginners with Examples']
py_begginers['Rating'].mean()

# We could do it compact, like
reviews[reviews['Course Name'] == 'Python for Beginners with Examples']['Rating'].mean()



4.300974901472723

### 3-  Average Rating for the last Quarter of 2020

In [25]:
q4 =reviews[(reviews['Timestamp'] >= datetime(2020, 10, 1,tzinfo=utc)) &
        (reviews['Timestamp'] <= datetime(2020, 12, 31,tzinfo=utc))]['Rating'].mean()


print('The 4th quarter of 2020 average rating is: '+ str(round(q4, 2)))

The 4th quarter of 2020 average rating is: 4.51


### We can compare it with the previous Quarter of 2020 to drive a conclusion about the people's percetption of the course

In [33]:
q3 = reviews[(reviews['Timestamp'] >= datetime(2020, 7, 1,tzinfo=utc)) &
        (reviews['Timestamp'] <= datetime(2020, 9, 30,tzinfo=utc))]['Rating'].mean()


print('The 3th quarter of 2020 average rating is: '+ str(round(q3, 2)))

The 3th quarter of 2020 average rating is: 4.49


### 4- Average Rating of the courses with no comments 

In [34]:
reviews[reviews['Comment'].isnull()]['Rating'].mean()

4.433679746603492

In [39]:
# Amount of courses with no comments 
reviews[reviews['Comment'].isnull()]['Rating'].count()


38201

### We can compare it with the average rating of courses with comments 

In [38]:
reviews[reviews['Comment'].notnull()]['Rating'].mean()

4.489777908515959

In [40]:
# Amount of courses with comments 
reviews[reviews['Comment'].notnull()]['Rating'].count()

6799

### 5- Exploring people's opinion about the accent in courses

In [41]:
reviews[reviews['Comment'].str.contains('accent')]

ValueError: Cannot mask with non-boolean array containing NA / NaN values

### Solving 'ValueError: Cannot mask with non-boolean array containing NA / NaN values' error 

In [42]:
reviews[reviews['Comment'].str.contains('accent', na = False)]

Unnamed: 0,Course Name,Timestamp,Rating,Comment
2099,The Complete Python Course: Build 10 Professio...,2021-01-24 19:02:55+00:00,4.5,The course is great but because of the instruc...
3025,The Python Mega Course: Build 10 Real World Ap...,2021-01-01 14:46:21+00:00,2.5,Sometimes it is difficult to understand the in...
3477,The Python Mega Course: Build 10 Real World Ap...,2020-12-18 06:00:42+00:00,4.5,"A little trouble at first with the accent, but..."
3754,The Python Mega Course: Build 10 Real World Ap...,2020-12-08 13:42:00+00:00,5.0,I was looking for some time for the good Pytho...
4690,Learn GIS in One Hour,2020-11-11 17:23:12+00:00,5.0,I am having trouble seeing the video and he is...
...,...,...,...,...
43785,Python for Beginners with Examples,2018-01-26 09:19:40+00:00,4.0,Please find the English review below!\n\nDiese...
44040,The Python Mega Course: Build 10 Real World Ap...,2018-01-20 01:06:54+00:00,5.0,HIGHLY RECOMMEND THIS COURSE! BUT PLEASE READ ...
44471,Learn GIS in One Hour,2018-01-11 11:11:12+00:00,4.5,This is a really nice course for beginners who...
44483,The Python Mega Course: Build 10 Real World Ap...,2018-01-11 05:06:48+00:00,3.0,Speakers accent was a bit difficult for me to ...


In [43]:
# Getting the amount of comments that talks about the accent on courses
reviews[reviews['Comment'].str.contains('accent', na = False)]['Rating'].count()

77

In [44]:
# Getting the average impresion of peoples about the accent on courses
reviews[reviews['Comment'].str.contains('accent', na = False)]['Rating'].mean()

3.8636363636363638

In [45]:
# If we compare it with the average rating, we can see that having accent in courses impacts negativly on user's perception
reviews['Rating'].mean()

4.442155555555556