As the final exercise in the course SQL for Data Science. This assignment is designed to test the knowledge of a wide range of concepts and SQL design techniques discussed throughout the course. Using a dataset from a US-based organization called Yelp Open Dataset , which provides a platform for users to provide reviews and rate their interactions with a variety of organizations – businesses, restaurants, health clubs, hospitals, local governmental offices, charitable organizations, etc. Yelp has made a portion of this data available for personal, educational, and academic purposes. Check out the Yelp Dataset ER Diagram and instructions below for more detailed information on the dataset and how was completed the assignment.
Note: For most information about the Dataset visit:
The entity relationship (ER) diagram below, should help familiarize you with the design of the Yelp Dataset provided for this peer review activity.
Note: Primary Keys are denoted in the ER-Diagram with a yellow key icon.
you are asked a series of questions that will help you profile and understand the data just like a data scientist would. For this first part of the assignment, you will be assessed both on the correctness of your findings, as well as the code you used to arrive at your answer. You will be graded on how easy your code is to read, so remember to use proper formatting and comments where necessary.
Answer: The dataset have 11 tables with 10.000 rows every each.
-- Code Used --
+ SELECT COUNT(*)
+ FROM "name of the table"
2. Find the total distinct records by either the foreign key or primary key for each table. If two foreign keys are listed in the table, please specify which foreign key:
Answer:
- Business = (PK), id= 10000
- Hours = (FK), business_id= 1562.
- Category = (FK), business_id= 2643.
- Attribute = (FK), business_id= 1115.
- Review = (PK), id=10000, (FK) business_id= 8090, (FK)User_id= 9581.
- Checkin = (FK), business_id= 493.
- Photo = (PK), id=10000, (FK)business_id= 6493.
- Tip = (FK), User_id= 537, (FK)business_id= 3979.
- User = (PK),id=10000.
- Friend = (FK), User_id= 11, (FK)
- Elite_years = (FK), User_id= 2780.
-- Code Used --
+ SELECT count(DISTINCT /*Primary or Foreign Key*/)
+ FROM -- Table
Answer: NO
-- Code Used --
+ SELECT COUNT(*)
+ FROM User
+ WHERE id IS NULL
+ OR name IS NULL
+ OR review_count IS NULL
+ OR yelping_since IS NULL
+ OR useful IS NULL
+ OR funny IS NULL
+ OR cool IS NULL
+ OR fans IS NULL
+ OR average_stars IS NULL
+ OR compliment_hot IS NULL
+ OR compliment_more IS NULL
+ OR compliment_profile IS NULL
+ OR compliment_cute IS NULL
+ OR compliment_list IS NULL
+ OR compliment_note IS NULL
+ OR compliment_plain IS NULL
+ OR compliment_cool IS NULL
+ OR compliment_funny IS NULL
+ OR compliment_writer IS NULL
+ OR compliment_photos IS NUL
4. For each table and column listed below, display the smallest (minimum), largest (maximum), and average (mean) value for the following fields:
Answer:
- Table: Review, Column: Stars
- Table: Business, Column: Stars
- Table: Tip, Column: Likes
- Table: Checkin, Column: Count
- Table: User, Column: Review_count
min: 1.0 max: 5.0 avg: 3.7082
min: 1.0 max: 5.0 avg: 3.6549
min: 0 max: 2 avg: 0.0144
min: 1 max: 53 avg: 1.9414
min: 0 max: 2000 avg: 24.2995
-- Code Used --
+SELECT MIN(/*Name of the column*/) AS "MIN"
+ ,MAX(/*Name of the column*/) AS "MAX"
+ ,AVG(/*Name of the column*/) AS "AVG"
+FROM /*Name of the Table*/
Answer:
City | All_reviews_by_city |
---|---|
Las Vegas
Phoenix Toronto Scottsdale Charlotte Henderson Tempe Pittsburgh Montréal Chandler Mesa Gilgert Cleveland Madison Glendale Mississauga Edinburgh Peoria North Las Vegas Markham Champaign Stuttgart Surprise Lakewood Goodyear |
82854 34503 24113 20614 12523 10871 10504 9798 9448 8112 6875 6380 5593 5265 4406 3814 2792 2624 2438 2352 2029 1849 1520 1465 1155 |
-- Code Used --
+ SELECT city
+ ,SUM(review_count) AS All_reviews_by_city
+ FROM business
+ GROUP BY city
+ ORDER BY all_reviews_by_city DESC
Avon:
Answer:
Star rating | count |
---|---|
1.5 2.5 3.5 4.0 4.5 5.0 |
10 6 88 21 31 3 |
-- Code Used --
+ SELECT stars AS "Star rating"
+ ,sum(review_count) AS "count"
+ FROM business
+ WHERE city == "Avon"
+ GROUP BY stars
Answer:
name | total number of reviews |
---|---|
Gerald Sara Yuri |
2000 1629 1339 |
-- Code Used --
+ SELECT name
+ ,sum(review_count) AS "total number of reviews"
+ FROM user
+ GROUP BY id
+ ORDER BY sum(review_count) DESC
+ LIMIT 3
Answer:
Tables 1 and 2 compare the number of reviews vs the number of fans. It is observed that only the user Gerald has a considerable number of reviews and fans, while the rest of the user varies his position in the top. So it can be concluded that there is NO CORRELATION BETWEEN a greater number of reviews and a greater number of fans.
Table 1. Total_ reviews vs Total_fans. Order by number of reviews in form descendent.
name | total number of reviews | total number of fans |
---|---|---|
Gerald
Sara Yuri .Hon William |
2000
1629 1339 1246 1215 |
253
50 76 101 126 |
Table 2. Total_ reviews vs Total_fans. Order by number of fans in form descendent.
name | total number of reviews | total number of fans |
---|---|---|
Amy
Mimi Harald Gerald Christine |
609
968 1153 2000 930 |
503
497 311 253 173 |
-- Code Used --
+ SELECT name
+ ,sum(review_count) AS "total number of reviews"
+ ,sum(fans) AS "total number of fans"
+ FROM user
+ GROUP BY id
+ ORDER BY sum(review_count) DESC -- table 1
+ ORDER BY sum(fans) DESC -- table 2
Answer:
YES, there are 1780 reviews with the word "love" and only 232 reviews with the word "hate".-- Code Used --
+ SELECT (SELECT count(id)
+ From review
+ WHERE text LIKE "%love%") AS "love"
+ ,(SELECT count(id)
+ From review
+ WHERE text LIKE "%hate%") AS "hate"
Answer:
YES, there are 1780 reviews with the word "love" and only 232 reviews with the word "hate".name | number of fans |
---|---|
Amy
Mimi Harald Gerald Christine Lisa Cat William Fran Lissa |
503
497 311 253 173 159 133 126 124 120 |
-- Code Used --
+ SELECT name
+ ,SUM(fans) AS "number of fans"
+ FROM user
+ GROUP BY Id
+ ORDER BY fans DESC
+ LIMIT 10
Answer:
YES, there are 1780 reviews with the word "love" and only 232 reviews with the word "hate".