# OpenClassrooms - Ingenieur IA
# Projet 10 - Fly Me - Partie 1 : Analyse et préparation des données
# Développez un chatbot pour réserver des vacances

## Objectif du projet : 
- **Construire un MVP qui aidera les employés de Fly Me à réserver facilement un billet d’avion pour leurs vacances**

## Plan :
- **Partie 1 : Analyse et préparation des données**
    - Chargement des données
    - Analyse des données brutes
    - Parsing des données pour LUIS
    - Sauvegarde des données parsées pour LUIS
    - Analyse des données parsées pour LUIS
    - Nettoyage des données parsées pour LUIS
    - Sauvegarde des données parsées et nettoyées pour LUIS
 
 
- **Partie 2 : Modélisation avec LUIS**
    - Chargement des données
    - Creation de l'application (LUISAuthoringClient)
    - Creation des exemples d'entrainement : Phrases (utterances), Entités (Entities) et Valeur des entités
        - Ajout d'exemples pour l'intention : OrderTravel
        - Ajout d'exemples pour l'intention : Greetings
        - Ajout d'exemples pour l'intention : None
    - Entrainement du modèle   
    - Publication du modèle
    - Evaluation du modèle
         - Creation du client de l'application (LUISRuntimeClient)
         - Evaluation sur le jeu de test
         
         
- **Partie 3 : Modélisation avec Microsoft Bot Framework**
    - Voir repository GitHub
    
    
- **Partie 4 : Tests**
    - Voir repository GitHub
    
    
- **Partie 5 : Déploiement**
    - Fait avec Azure + GitHub Actions


- **Partie 6 : Monitoring**
    - Fait avec Azure Application Insights + rédaction d'une méthodologie

## Script Partie 1 : Analyse et préparation des données

### Remarque préliminaire :
- **Le code de ce script est découpé en fonctions, ce qui permet une meilleure lisibilité et une meilleure maintenabilité du code**

In [1]:
import pandas as pd
import numpy as np

# Chargement des données

In [2]:
df_data = pd.io.json.read_json('../data/frames.json')

# Analyse des données brutes

In [3]:
df_data

Unnamed: 0,user_id,turns,wizard_id,id,labels
0,U22HTHYNP,[{'text': 'I'd like to book a trip to Atlantis...,U21DKG18C,e2c0fc6c-2134-4891-8353-ef16d8412c9a,"{'userSurveyRating': 4.0, 'wizardSurveyTaskSuc..."
1,U21E41CQP,"[{'text': 'Hello, I am looking to book a vacat...",U21DMV0KA,4a3bfa39-2c22-42c8-8694-32b4e34415e9,"{'userSurveyRating': 3.0, 'wizardSurveyTaskSuc..."
2,U21RP4FCY,[{'text': 'Hello there i am looking to go on a...,U21E0179B,6e67ed28-e94c-4fab-96b6-68569a92682f,"{'userSurveyRating': 2.0, 'wizardSurveyTaskSuc..."
3,U22HTHYNP,[{'text': 'Hi I'd like to go to Caprica from B...,U21DKG18C,5ae76e50-5b48-4166-9f6d-67aaabd7bcaa,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
4,U21E41CQP,"[{'text': 'Hello, I am looking to book a trip ...",U21DMV0KA,24603086-bb53-431e-a0d8-1dcc63518ba9,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
...,...,...,...,...,...
1364,U2AMZ8TLK,[{'text': 'Hi I've got 9 days free and I'm loo...,U21DMV0KA,957fd205-bb7c-4b81-8cb6-13c81c51c5c9,"{'userSurveyRating': 3.5, 'wizardSurveyTaskSuc..."
1365,U2AMZ8TLK,[{'text': 'I need to get to Fortaleza on Septe...,U260BGVS6,71b21b86-2d05-4372-a0ee-6ed64b0ddc42,"{'userSurveyRating': 4.5, 'wizardSurveyTaskSuc..."
1366,U231PNNA3,[{'text': 'We're finally going on vacation isn...,U21T9NMKM,ef2cd70e-c1f2-42be-8839-cb465af0bf41,"{'userSurveyRating': 5.0, 'wizardSurveyTaskSuc..."
1367,U2AMZ8TLK,"[{'text': 'Hi there, I'm looking for a place t...",U21DMV0KA,ffa79d2c-14eb-45e6-8573-b0817a1a1803,"{'userSurveyRating': 4.0, 'wizardSurveyTaskSuc..."


In [4]:
len(df_data)

1369

# Parsing des données pour LUIS

## Un exemple d'illustration

In [5]:
df_data['turns'][0]

[{'text': "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
  'labels': {'acts': [{'args': [{'val': 'book', 'key': 'intent'}],
     'name': 'inform'},
    {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
      {'val': 'Caprica', 'key': 'or_city'},
      {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
      {'val': '8', 'key': 'n_adults'},
      {'val': '1700', 'key': 'budget'}],
     'name': 'inform'}],
   'acts_without_refs': [{'args': [{'val': 'book', 'key': 'intent'}],
     'name': 'inform'},
    {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
      {'val': 'Caprica', 'key': 'or_city'},
      {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
      {'val': '8', 'key': 'n_adults'},
      {'val': '1700', 'key': 'budget'}],
     'name': 'inform'}],
   'active_frame': 1,
   'frames': [{'info': {'intent': [{'val': 'book', 'negated': False}],
      'budget': [{'val': '1700.0', 'negated': False}],

In [6]:
df_data['turns'][0][0]

{'text': "I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.",
 'labels': {'acts': [{'args': [{'val': 'book', 'key': 'intent'}],
    'name': 'inform'},
   {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
     {'val': 'Caprica', 'key': 'or_city'},
     {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
     {'val': '8', 'key': 'n_adults'},
     {'val': '1700', 'key': 'budget'}],
    'name': 'inform'}],
  'acts_without_refs': [{'args': [{'val': 'book', 'key': 'intent'}],
    'name': 'inform'},
   {'args': [{'val': 'Atlantis', 'key': 'dst_city'},
     {'val': 'Caprica', 'key': 'or_city'},
     {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
     {'val': '8', 'key': 'n_adults'},
     {'val': '1700', 'key': 'budget'}],
    'name': 'inform'}],
  'active_frame': 1,
  'frames': [{'info': {'intent': [{'val': 'book', 'negated': False}],
     'budget': [{'val': '1700.0', 'negated': False}],
     'dst_city': [{

In [7]:
df_data['turns'][0][0]['text']

"I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700."

In [8]:
df_data['turns'][0][0]['labels']['acts'][0]['args'][0]['key']

'intent'

In [9]:
df_data['turns'][0][0]['labels']['acts'][1]['args']

[{'val': 'Atlantis', 'key': 'dst_city'},
 {'val': 'Caprica', 'key': 'or_city'},
 {'val': 'Saturday, August 13, 2016', 'key': 'str_date'},
 {'val': '8', 'key': 'n_adults'},
 {'val': '1700', 'key': 'budget'}]

In [10]:
df_data['turns'][0][0]['labels']['acts'][1]['args'][0]

{'val': 'Atlantis', 'key': 'dst_city'}

## Fonction parsing
- Fonction **parse_data_for_luis** permettant le parsing des données pour être par la suite utilisées par le service Azure LUIS

In [12]:
def parse_data_for_luis(df_data_to_parse):
    
    #création du dataframe renvoyé
    col_names =  ['text', 'intent', 'or_city', 'dst_city', 'str_date', 'end_date', 'budget']
    df_data_parsed_luis  = pd.DataFrame(columns = col_names)
    
    #parsing des données
    for i in range(len(df_data)):

        text=""
        intent=""
        or_city=""
        dst_city=""
        str_date=""
        end_date=""
        budget=""

        #récupération du texte du message
        text=df_data['turns'][i][0]['text']

        #cas où il y a une intention spécifiée
        if(len(df_data['turns'][i][0]['labels']['acts'])!=0):

            #intention de type Greeting
            if df_data['turns'][i][0]['labels']['acts'][0]['name']=='greeting':
                intent='greeting'

            #intention de type Intent
            for k in range(len(df_data['turns'][i][0]['labels']['acts'])):      
                if len(df_data['turns'][i][0]['labels']['acts'][k]['args'])!=0:
                    if df_data['turns'][i][0]['labels']['acts'][k]['args'][0]['key']=='intent':
                        intent=df_data['turns'][i][0]['labels']['acts'][k]['args'][0]['val']       
    
            #récupération des informations du message (cas1)
            if (len(df_data['turns'][i][0]['labels']['acts'])>1):
                for j in range(len(df_data['turns'][i][0]['labels']['acts'][1]['args'])):
                    if df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['key']=="or_city":
                        or_city=df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['key']=="dst_city":
                        dst_city=df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['key']=="str_date":
                        str_date=df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['key']=="end_date":
                        end_date=df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['val']  
                    if df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['key']=="budget":
                        budget=df_data['turns'][i][0]['labels']['acts'][1]['args'][j]['val']
            
            #récupération des informations du message (cas2)
            else:
                for j in range(len(df_data['turns'][i][0]['labels']['acts'][0]['args'])):
                    if df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['key']=="or_city":
                        or_city=df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['key']=="dst_city":
                        dst_city=df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['key']=="str_date":
                        str_date=df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['val']
                    if df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['key']=="end_date":
                        end_date=df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['val']  
                    if df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['key']=="budget":
                        budget=df_data['turns'][i][0]['labels']['acts'][0]['args'][j]['val']

        #cas où il n'y a pas d'intention spécifiée
        else:
            intent="none"

        #remplissage du dataframe avec les informations récupérées
        df_data_parsed_luis.loc[i] = [text, intent, or_city, dst_city, str_date, end_date, budget]

    return df_data_parsed_luis

In [13]:
df_data_parsed_luis = parse_data_for_luis(df_data)

In [14]:
df_data_parsed_luis

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
0,I'd like to book a trip to Atlantis from Capri...,book,Caprica,Atlantis,"Saturday, August 13, 2016",,1700
1,"Hello, I am looking to book a vacation from Go...",book,Gotham City,Mos Eisley,,,2100
2,Hello there i am looking to go on a vacation w...,book,,Gotham City,,,
3,"Hi I'd like to go to Caprica from Busan, betwe...",book,Busan,Caprica,"Sunday August 21, 2016","Wednesday August 31, 2016",
4,"Hello, I am looking to book a trip for 2 adult...",book,Kochi,Denver,,,"$21,300"
...,...,...,...,...,...,...,...
1364,Hi I've got 9 days free and I'm looking for a ...,book,,,,,
1365,I need to get to Fortaleza on September 8th or...,book,,Fortaleza,September 8th,,
1366,We're finally going on vacation isn't that ama...,book,,,,,15600
1367,"Hi there, I'm looking for a place to get away ...",book,,,,,


# Sauvegarde des données parsées pour LUIS

In [15]:
df_data_parsed_luis.to_csv("luis_parsed_dataset.csv", index=False)

# Analyse des données parsées pour LUIS

In [16]:
df_data_parsed_luis

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
0,I'd like to book a trip to Atlantis from Capri...,book,Caprica,Atlantis,"Saturday, August 13, 2016",,1700
1,"Hello, I am looking to book a vacation from Go...",book,Gotham City,Mos Eisley,,,2100
2,Hello there i am looking to go on a vacation w...,book,,Gotham City,,,
3,"Hi I'd like to go to Caprica from Busan, betwe...",book,Busan,Caprica,"Sunday August 21, 2016","Wednesday August 31, 2016",
4,"Hello, I am looking to book a trip for 2 adult...",book,Kochi,Denver,,,"$21,300"
...,...,...,...,...,...,...,...
1364,Hi I've got 9 days free and I'm looking for a ...,book,,,,,
1365,I need to get to Fortaleza on September 8th or...,book,,Fortaleza,September 8th,,
1366,We're finally going on vacation isn't that ama...,book,,,,,15600
1367,"Hi there, I'm looking for a place to get away ...",book,,,,,


In [17]:
df_data_parsed_luis['intent'].unique()

array(['book', '', 'greeting', 'none'], dtype=object)

In [18]:
df_data_parsed_luis['intent'].value_counts()

book        1134
greeting     141
              91
none           3
Name: intent, dtype: int64

In [19]:
df_data_parsed_luis[df_data_parsed_luis['intent']=='book']

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
0,I'd like to book a trip to Atlantis from Capri...,book,Caprica,Atlantis,"Saturday, August 13, 2016",,1700
1,"Hello, I am looking to book a vacation from Go...",book,Gotham City,Mos Eisley,,,2100
2,Hello there i am looking to go on a vacation w...,book,,Gotham City,,,
3,"Hi I'd like to go to Caprica from Busan, betwe...",book,Busan,Caprica,"Sunday August 21, 2016","Wednesday August 31, 2016",
4,"Hello, I am looking to book a trip for 2 adult...",book,Kochi,Denver,,,"$21,300"
...,...,...,...,...,...,...,...
1364,Hi I've got 9 days free and I'm looking for a ...,book,,,,,
1365,I need to get to Fortaleza on September 8th or...,book,,Fortaleza,September 8th,,
1366,We're finally going on vacation isn't that ama...,book,,,,,15600
1367,"Hi there, I'm looking for a place to get away ...",book,,,,,


In [20]:
df_data_parsed_luis[df_data_parsed_luis['intent']=='greeting']

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
40,Hi!,greeting,,,,,
43,Hi! I'd like to go to Boston from Mos Eisley o...,greeting,Mos Eisley,Boston,August 15th,,
48,Heyo!,greeting,,,,,
52,Good morning.,greeting,,,,,
63,Hello wozbot!,greeting,,,,,
...,...,...,...,...,...,...,...
1188,hi there. i really wanna pretend im somewhere ...,greeting,,,,,2900
1211,Guess what? I'm a recently married person look...,greeting,osaka,manaus,,,
1223,Hi,greeting,,,,,
1251,Hi,greeting,,,,,


In [21]:
df_data_parsed_luis[df_data_parsed_luis['intent']=='none']

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
526,"Have you ever read the book ""Vernon's Travels""?",none,,,,,
657,psssstttttt,none,,,,,
1158,Vacay time woooohooooooo,none,,,,,


In [22]:
df_data_parsed_luis[df_data_parsed_luis['intent']=='']

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
32,"Hello, I have 15 vacation days available betwe...",,Theed,,June 1st,August 31st,
56,Hi. I need to book a vacation to Long Beach be...,,Paris,Long Beach,August 25,September 3,
61,Hi we're from Miami and we want to go to paris...,,Miami,paris,,,
94,Hi im fro termina and i want to go on vacation...,,,,,,
95,Hi i am looking to go to Punta Cana with my th...,,,,,,
...,...,...,...,...,...,...,...
1298,Fukuoka to Belo Horizonte. 6000. 9th to 17th.,,Fukuoka,Belo Horizonte,9th,17th,6000
1301,Hi! I have 9 days of vacation time. Can you fi...,,Ciudad Juarez,,September 20th,,
1316,Hey there! The 17 of us wants to get out of Te...,,Tel Aviv,,,,
1346,I am one adult travelling to Maceio. Do you ha...,,Osaka,Maceio,,,


In [23]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["text"][i])

I'd like to book a trip to Atlantis from Caprica on Saturday, August 13, 2016 for 8 adults. I have a tight budget of 1700.
Hello, I am looking to book a vacation from Gotham City to Mos Eisley for $2100.
Hello there i am looking to go on a vacation with my family to Gotham City, can you help me?
Hi I'd like to go to Caprica from Busan, between Sunday August 21, 2016 and Wednesday August 31, 2016
Hello, I am looking to book a trip for 2 adults and 6 children for $21,300 or less. We are departing from Kochi for Denver.
Hey, i Want to go to St. Louis on the 17th of August
I'm looking for a trip to Gotham City leaving from Kakariko Village on Saturday, August 13, 2016. 3 adults for no more than $2400 USD.
Hello, I would like to book a 2-week trip leaving from Melbourne on August 27. I would like to go to Mannheim.
Hello, I am planning to book a trip to pittsborgh
Hi, I need to go to Mos Eisley for a wedding, leaving on Saturday, August 13, 2016 and returning on Tuesday, August 16, 2016. Pr

Looking to take my squad out to Punta Cana! I’ll pay whatever it takes to fly out of Tel Aviv
I have 8 days off coming up. Really want to get out of the country
the country being brazil
do you have flights to mexico city?
Hello
Looking to go from San Francisco to MArseille. Book me for September 18 to 22. Let me know if its more than 2800 because thats all I can afford
I need to book a business trip - something impressive... thinking… las vegas
3 adults find me the best you can out of mannheim after september 11th
Las Vegas to Mexico City for 7 please
Hello. I need Paris to Barcelonna Sept 18 to 22
6 adults to Rosario from Toluca it needs to be impressive
get me to Manaus from Toronto
I have a vacation starting next week, I can hardly contain my excitement Getting out of St. Petersburg for 12 days will be dope. What kind of destinations do you offer?
From between Sept 6 and Sept 11, I’m on break. Leaving from Columbus
im broke but wanna dip out of north vancouver
7 business class fligh

In [24]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["or_city"][i])

Caprica
Gotham City

Busan
Kochi

Kakariko Village



Detroit
Vancouver, Jamaica
Columbus
Mos Eisley
Diagon Alley
Vitoria


Caprica
Godric's Hollow
Alexandria
termina
Essen
Kakariko Village

Busan
Seoul
Pittsburgh
Cairo
Curitiba

Gotham City
Theed
Munich
punta cana
Maceio
Mannheim
Porto Alegre

Diagon Alley



Mos Eisley
tampa
Leon
Miami
Indianapolis





London


Paris



Kobe
Miami








Mannheim
Coruscant

Houston
Toronto


Kakariko Village
Madrid

Houston
Los Angeles
Portland
Nagoya


Godric's Hollow


cancun


Hogsmeade
Salvador


Denver
neverland






Caprica
London

Manaus

Belo Horizonte




Chicago

Stuttgart

Valencia
Germany




Termina
Madrid



Houston


Hamburg
san diego
punta cana
Mannheim


Nagoya

Punta Cana
Brasilia
kyoto
Hiroshima


mos eisley
Montreal
Diagon Alley

Seattle

sapporo


detroit
Toluca








Ciudad Juarez

Stuttgart

san antonio
leon









north vancouver







sydney

The Veg

Los Angeles
Marseille
Calgary
denver
Santa Cruz
Sydney
Alexandria



In [25]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["dst_city"][i])

Atlantis
Mos Eisley
Gotham City
Caprica
Denver
St. Louis
Gotham City

pittsborgh
Mos Eisley

Recife

Atlantis
Coruscant

Kakariko Village
Mexico City
Hyrule
Goiania


Fortaleza
Hogsmeade

Las Vegas
Manaus
San Juan
Sendai

Sacramento
Hogsmead

-1

Philadelphia
New York
Mannheim

Hogsmeade


Caprica
Boston
Dallas

Manaus
Tampa

Milan




Valencia

Long Beach




paris
Fukuoka







Kyoto
atlantis


Cairo

Burlington

Naples
rome
San Antonio




baltimore
Diagon Alley
Neverland


Manaus
hamburg
Mos Eisley
Marseille


Birmingham
Mos eisley

Porto

Monterrey
hiroshima

Theed


Fukuoka

San Diego
-1

theed

San Diego

Sapporo
Coruscant
Osaka
Naples
san antonio


San Juan
Atlantis




Santo Domingo
Punta Cana
Burlington

kochi
mexico
Fukuoka
Guadalajara

Phoenix
SAN FRANCISCO
Cairo
Chicago
melbourne
Phoenix

Kobe
kyoto
Hiroshima
Naples
Tijuana

Corsucant
fortaleza

Manaus
mexico city
Paris
Gotham City

curitiba
Atlanta
Lima
Paris
Belem
Toronto


Monterrey

Kyoto
Ulsan
Ulsan

toronto


Housto

In [26]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["str_date"][i])

Saturday, August 13, 2016


Sunday August 21, 2016

17th of August
Saturday, August 13, 2016


Saturday, August 13, 2016
Monday, August 15, 2016
August 24


-1


Thursday, August 18, 2016

August 27, 2016

the 13th
Thursday, August 18
Saturday, August 13th
13th of august
Wednesday August 17
August 27

August 15
August 18


June 1st
Monday, August 15th



August 17
August 27th
August 13th

August 18

August 15th


August 19th
20th


August 27th





August 25



August 27









August 19



August 17



August 17



August 17
August 26th


August 13th
August 16




August 13
August 17


August 18



August 22nd
August 17th

August 19th
August 13th
August 17th



August 27th

August 27th


August 26th











August 18


August 26th


August 15th


August 27th
August 25

August 15th

26 August
August 27th

August 27th

august 27
august 27
August 19th



next month
august 27
August 27th


August 25th



August 25th






August 27


27th of august









august 30th
August 26th
Au

In [27]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["end_date"][i])




Wednesday August 31, 2016





Tuesday, August 16, 2016
Wednesday, August 31, 2016



-1


Thursday, September 8, 2016




Saturday, September 3


Monday September 5
September 9


August 29


August 31st
Friday August 26th



September 7



September 4




September 2nd
31 August


August 30th





September 3



September 2













September 7












19













August 25th

September 3rd

August 31st



August 31st

September 11th


September 5th











August 26











August 29th

9 September
September 10th



31







August 30th


September 9th










September 15














19th
sep 15



September 12





September 7








30
September 3







september 6































































September 10







Tuesday








september 8
September 11th
SEPT 4

11
wed sept 21

August 31st
wednesday september 7 2016
September 11th





September 9


11th




mon
September 16th







September 5th
Monday, September 12


In [28]:
for i in range (len(df_data_parsed_luis)):
    print(df_data_parsed_luis["budget"][i])

1700
2100


$21,300

$2400 USD


$3700
$3200


$3600
$3400




4300USD







6600 USD
-1








$3000














































$3300


$3500


$3900


$4600









1700






tight
$2600









4600


$3300



$2100
1500



28500

















1800



1500

$3200









































1600


5200











3600 dollars

4000



























$14900














































1700





4400$


6000$
















$3300



$2000



2500
5100

1900










0$






6200$

-1













pretty broke







3200


4900



2000






$5000









little

3100



2600 dollars



3200















16700




$3400
5700

1800
$400


19900
all my savings



3700






34800


2700






6500

3300

travel for free
4000





$1800
1400




3600







3400

1900




5400







4600




5000













3600

$4000


$6200



3200
$1000






4800


1900

2400










$2400





$4000




2700
3200
3400





# Nettoyage des données parsées pour LUIS

In [29]:
df_data_parsed_luis_cleaned = df_data_parsed_luis

## Suppresion des valeurs '-1'

In [30]:
df_data_parsed_luis_cleaned = df_data_parsed_luis_cleaned.replace({'-1': ""})

## Homogénéisation dollar :
- le budget est indiqué en dollar -> on peut supprimer la currency pour homogénéiser les données

In [31]:
df_data_parsed_luis_cleaned['budget']=df_data_parsed_luis_cleaned['budget'].apply(lambda x: x.replace("$",""))
df_data_parsed_luis_cleaned['budget']=df_data_parsed_luis_cleaned['budget'].apply(lambda x: x.replace("USD",""))
df_data_parsed_luis_cleaned['budget']=df_data_parsed_luis_cleaned['budget'].apply(lambda x: x.replace("dollar",""))
df_data_parsed_luis_cleaned['budget']=df_data_parsed_luis_cleaned['budget'].apply(lambda x: x.replace("dollars",""))

In [32]:
df_data_parsed_luis_cleaned

Unnamed: 0,text,intent,or_city,dst_city,str_date,end_date,budget
0,I'd like to book a trip to Atlantis from Capri...,book,Caprica,Atlantis,"Saturday, August 13, 2016",,1700
1,"Hello, I am looking to book a vacation from Go...",book,Gotham City,Mos Eisley,,,2100
2,Hello there i am looking to go on a vacation w...,book,,Gotham City,,,
3,"Hi I'd like to go to Caprica from Busan, betwe...",book,Busan,Caprica,"Sunday August 21, 2016","Wednesday August 31, 2016",
4,"Hello, I am looking to book a trip for 2 adult...",book,Kochi,Denver,,,21300
...,...,...,...,...,...,...,...
1364,Hi I've got 9 days free and I'm looking for a ...,book,,,,,
1365,I need to get to Fortaleza on September 8th or...,book,,Fortaleza,September 8th,,
1366,We're finally going on vacation isn't that ama...,book,,,,,15600
1367,"Hi there, I'm looking for a place to get away ...",book,,,,,


In [33]:
for i in range (len(df_data_parsed_luis_cleaned)):
    print(df_data_parsed_luis_cleaned["budget"][i])

1700
2100


21,300

2400 


3700
3200


3600
3400




4300







6600 









3000














































3300


3500


3900


4600









1700






tight
2600









4600


3300



2100
1500



28500

















1800



1500

3200









































1600


5200











3600 s

4000



























14900














































1700





4400


6000
















3300



2000



2500
5100

1900










0






6200















pretty broke







3200


4900



2000






5000









little

3100



2600 s



3200















16700




3400
5700

1800
400


19900
all my savings



3700






34800


2700






6500

3300

travel for free
4000





1800
1400




3600







3400

1900




5400







4600




5000













3600

4000


6200



3200
1000






4800


1900

2400










2400





4000




2700
3200
3400












5800






4700

3300








4300


13700






# Sauvegarde des données parsées et nettoyées pour LUIS

In [34]:
df_data_parsed_luis_cleaned.to_csv("luis_parsed_cleaned_dataset.csv", index=False)