# Improving the AI Product of Your Startup

## Enhancing User Engagement and Insights through AI-driven Collaboration

- Notebook by [Nasr-edine DRAI](https://www.hackerrank.com/d_nasredine)
- [Openclassrooms](https://openclassrooms.com/en/)


<div style="text-align:center;">
    <img src="../../imgs/custom_seg.jpeg" width="400" />
</div>


#### Introduction

Welcome to the project "Improving the AI Product of Your Startup." In this project, you are an AI Engineer working for the startup "Avis Restau," which connects customers with restaurants. Your company aims to enhance its platform by introducing a new collaboration feature, allowing users to post reviews and photos of their favorite restaurants. Additionally, the company wants to gain better insights into the user-posted reviews.

#### Problem Statement and Objective

The primary objective of this project is to conduct a feasibility study for two specific functionalities: detecting dissatisfaction topics in user comments and automatically labeling the photos posted on the platform. To achieve this, you need to analyze existing data and collect new data to train your AI models.

#### Overview of the Dataset

The problem statement highlights that there is insufficient data available on the Avis Restau platform. Therefore, the solution is to utilize an existing dataset. The recommended dataset for this project is the Yelp dataset, which contains general information about restaurants, including consumer reviews. You can access the dataset through the following link: https://www.yelp.com/dataset

### Get restaurants

In [3]:

import sys
import os
import csv
import json
import requests

# Get the current working directory
current_dir = os.getcwd()
display(current_dir)
# Get the parent directory path
parent_dir = os.path.dirname(current_dir)
display(parent_dir)
# Add the parent directory to the Python path
sys.path.append(parent_dir)
import config  # Import the config module



'/Users/drainasr-edine/github/ingenieur_ia/P6_drai_nasr-edine/notebooks'

'/Users/drainasr-edine/github/ingenieur_ia/P6_drai_nasr-edine'

In [24]:
url = "https://api.yelp.com/v3/businesses/search"

# Define the Yelp API access token
access_token = config.API_KEY  # Access the API key from the config module

offset = 0
params = {
    "sort_by": "best_match",
    "limit": 50,
    "offset": offset,
    "location": "paris",
    "term": "restaurants"
}
headers = {
    "Authorization": f"Bearer {access_token}"    
}

# Define the file path and name
path = '../data/'
filename = "Drai_Nasredine_1_csv_062023.csv"
csv_file = path + filename

test = 0
for  i in range(4):
    response = requests.get(url, headers=headers, params=params)
    
    data = json.loads(response.text)
    offset += 50
    params['offset'] = offset
    # Open the CSV file in write mode
    with open(csv_file, mode='a', newline='') as file:
        writer = csv.writer(file)

        if test == 0:
            # Write the header row
            writer.writerow(data['businesses'][0].keys())
            test = 1
    
        # Write each business as a row
        for business in data['businesses']:
            writer.writerow(business.values())
    
print(f"Data successfully written to '{csv_file}'.")


Data successfully written to '../data/Drai_Nasredine_1_csv_062023.csv'.


In [3]:
import requests
import json
import csv

url1 = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&limit=50&offset=0&location=paris&term=restaurants"
url2 = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&limit=50&offset=50&location=paris&term=restaurants"
url3 = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&limit=50&offset=100&location=paris&term=restaurants"
url4 = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&limit=50&offset=150&location=paris&term=restaurants"

filename = "Drai_Nasredine_1_csv_062023.csv"
path = "../data/"

# Define the Yelp API access token
access_token = config.API_KEY  # Access the API key from the config module

payload={}
headers = {
  'Authorization': access_token
}

test = 0
for url in [url1, url2, url3, url4]:
    response = requests.request("GET", url, headers=headers, data=payload)
    
    # print(response.text)
    data = json.loads(response.text)
    
    # CSV file path
    csv_file = path + filename
    
    # Open the CSV file in write mode
    with open(csv_file, mode='a', newline='') as file:
        # Create a CSV writer
        writer = csv.writer(file)
    
        # Write the header row
        if test == 0:
            writer.writerow(data['businesses'][0].keys())
            test = 1
    
        # Write each category as a row
        for category in data['businesses']:
            writer.writerow(category.values())
        file.close()
print("JSON data converted to CSV successfully.")


KeyError: 'businesses'

In [14]:
import pandas as pd

In [25]:
restaurants = pd.read_csv(csv_file)
restaurants.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,-0iLH7iQNYtoURciDpJf6w,le-comptoir-de-la-gastronomie-paris,Le Comptoir de la Gastronomie,https://s3-media3.fl.yelpcdn.com/bphoto/xT4YkC...,False,https://www.yelp.com/biz/le-comptoir-de-la-gas...,1240,"[{'alias': 'french', 'title': 'French'}]",4.5,"{'latitude': 48.8645157999652, 'longitude': 2....",[],€€,"{'address1': '34 rue Montmartre', 'address2': ...",33142333132,+33 1 42 33 31 32,370.827517
1,IU9_wVOGBKjfqTTpAXpKcQ,bistro-des-augustins-paris,Bistro des Augustins,https://s3-media2.fl.yelpcdn.com/bphoto/ctHDHM...,False,https://www.yelp.com/biz/bistro-des-augustins-...,472,"[{'alias': 'bistros', 'title': 'Bistros'}, {'a...",4.5,"{'latitude': 48.854754, 'longitude': 2.342119}",[],€€,"{'address1': '39 quai des Grands Augustins', '...",33143540441,+33 1 43 54 04 41,801.11761
2,cEjF41ZQB8-SST8cd3EsEw,l-avant-comptoir-paris-3,L'Avant Comptoir,https://s3-media3.fl.yelpcdn.com/bphoto/mVwgxg...,False,https://www.yelp.com/biz/l-avant-comptoir-pari...,649,"[{'alias': 'tapas', 'title': 'Tapas Bars'}, {'...",4.5,"{'latitude': 48.85202, 'longitude': 2.3388}",[],€€,"{'address1': ""3 carrefour de l'Odéon"", 'addres...",33142384755,+33 1 42 38 47 55,1131.333887
3,-umFmobUgpW_05m_ud1vHw,la-cordonnerie-paris-5,La Cordonnerie,https://s3-media1.fl.yelpcdn.com/bphoto/IeTa4j...,False,https://www.yelp.com/biz/la-cordonnerie-paris-...,92,"[{'alias': 'french', 'title': 'French'}]",4.5,"{'latitude': 48.86543, 'longitude': 2.33237}",[],€€€,"{'address1': '20 rue Saint Roch', 'address2': ...",33142601742,+33 1 42 60 17 42,811.376248
4,wLgAxIB7111BcWLWh7KpFw,la-régalade-paris-3,La Régalade,https://s3-media3.fl.yelpcdn.com/bphoto/f_-Xgg...,False,https://www.yelp.com/biz/la-r%C3%A9galade-pari...,101,"[{'alias': 'french', 'title': 'French'}]",4.5,"{'latitude': 48.8616441182389, 'longitude': 2....",[],€€€,"{'address1': '106 rue Saint-Honoré', 'address2...",33142219240,+33 1 42 21 92 40,36.28048


In [None]:
restaurants.info()

In [16]:
restaurants.describe()

Unnamed: 0,review_count,rating,distance
count,200.0,200.0,178.0
mean,247.79,4.3875,1071.089953
std,359.081108,0.289851,839.554957
min,4.0,3.5,36.28048
25%,47.5,4.0,446.155315
50%,130.0,4.5,825.239777
75%,320.0,4.5,1368.72054
max,1920.0,5.0,4281.588159


In [26]:
# Check for duplicate rows
duplicate_rows = restaurants.duplicated()

# Print the duplicate rows
restaurants[duplicate_rows]


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
50,005OAKLymX9LqwM5vCiA1Q,le-machon-paris,Le Machon,https://s3-media3.fl.yelpcdn.com/bphoto/RL_Eur...,False,https://www.yelp.com/biz/le-machon-paris?adjus...,4,"[{'alias': 'restaurants', 'title': 'Restaurant...",5.0,"{'latitude': 48.8620888, 'longitude': 2.3661881}",[],€€,"{'address1': '16 Rue Commines', 'address2': No...",33142745709,+33 1 42 74 57 09,1758.659234
100,ZxtU74SJMRoB8ZQXWwNtRQ,le-soufflé-paris-2,Le Soufflé,https://s3-media2.fl.yelpcdn.com/bphoto/dJSOGU...,False,https://www.yelp.com/biz/le-souffl%C3%A9-paris...,310,"[{'alias': 'french', 'title': 'French'}]",4.0,"{'latitude': 48.86645, 'longitude': 2.32647}",[],€€€,"{'address1': '36 rue du Mont Thabor', 'address...",33142602719,+33 1 42 60 27 19,1248.905257


### Get Reviews

In [27]:
import csv
import json
import requests

# Get the list of restaurant IDs
restaurant_ids = restaurants.id.to_list()[:5]

# Define the file path and name
path = '../data/'
filename = 'Drai_Nasredine_2_csv_062023.csv'
csv_file = path + filename
test = 0
# Define the Yelp API access token
access_token = config.API_KEY  # Access the API key from the config module

# Open the CSV file in append mode
with open(csv_file, mode='a', newline='') as file:
    writer = csv.writer(file)

    # Iterate over the restaurant IDs
    for id in restaurant_ids:
        url = f"https://api.yelp.com/v3/businesses/{id}/reviews"
        headers = {'Authorization': f'Bearer {access_token}'}
        response = requests.get(url, headers=headers)
        data = response.json()

        # Write the header row if it's the first restaurant
        if test == 0 and 'reviews' in data:
            writer.writerow(['restaurant_id'] + list(data['reviews'][0].keys()))
            test = 1

        # Write each review as a row
        if 'reviews' in data:
            for review in data['reviews']:
                writer.writerow([id] + list(review.values()))

print(f"Data successfully written to '{csv_file}'.")


Data successfully written to '../data/Drai_Nasredine_2_csv_062023.csv'.
