# Data

### Context
This dataset is scraped from the TripAdvisor website, showing the attractions information on Great ocean road (GOR) and the Sunshine Coast (SCC) in detail.
Let explore attractions in Great ocean road and Sunshine coast to have a deep understanding of the tourism industry in Australia.

### Content
The information the dataset can provide is listed below:

1. The scraped attraction names, region name, and corresponding URLs. (attractionsgor.csv; attractionsscc.csv)

2. For each attraction in GOR and SCC, the latest 500 review information including reviewer's hometown, trip type, review title, review comments, and a score (attractionReviewCommentsGOR.csv; attractionReviewCommentsSCC.csv)

3. For each attraction in GOR and SCC, the nearby eatery places with their types and price range and nearby attractions are scraped. (attractionsnearbyLocationsGOR.csv; attractionReviewCommentsSCC.csv)

### Acknowledgements
The dataset is scraped from the TripAdvisor website. The dataset is only for Non-Commercial Research Purposes

# Data Preparing 

### 1. Importing the required libraries

In [None]:
import pandas as pd
import numpy as np


import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### 2. Loading the data into the data frame + Exploring The Data

In [None]:
attractions_gor = pd.read_csv("../input/tripadvisor-attractions-reviews-nearby-locations/attractions_gor.csv")
attractions_scc = pd.read_csv("../input/tripadvisor-attractions-reviews-nearby-locations/attractions_scc.csv")

In [None]:
attractions_gor.head()

In [None]:
attractions_gor.tail()

In [None]:
attractions_gor.info()

In [None]:
attractions_gor.shape

In [None]:
attractions_scc.head()

In [None]:
attractions_scc.tail()

In [None]:
attractions_scc.info()

In [None]:
attractions_scc.shape

### 3. attractions_GOR_most_visited : Data Cleaning

we create a new variable called attractions_gor_most_visited and remove the word "Attractions" from the index

In [None]:
attractions_gor_most_visited = attractions_gor["region_name"].value_counts().sort_values()[-1::-1]
for name,i in zip(attractions_gor_most_visited.index,range(len(attractions_gor_most_visited))):
    attractions_gor_most_visited = attractions_gor_most_visited.rename(index= {attractions_gor_most_visited.index[i] : name.strip("Attractions")})
attractions_gor_most_visited

### 4. attractions_scc_most_visited : Data Cleaning

In [None]:
attractions_scc_most_visited = attractions_scc["region_name"].value_counts().sort_values()[-1::-1]
attractions_scc_most_visited

###  5. attractions_GOR_most_visited : Data Visualization

In [None]:
plt.figure(figsize = (15,8))
sns.barplot(attractions_gor_most_visited.index,attractions_gor_most_visited.values)
plt.xticks(rotation = 90,size = 15)
plt.yticks(size = 15)
plt.title("attractions_GOR_most_visited",size = 20)
plt.show()

### 6. attractions_scc_most_visited : Data Visualization

In [None]:
plt.figure(figsize = (15,8))
sns.barplot(attractions_scc_most_visited.index,attractions_scc_most_visited.values)
plt.xticks(rotation = 90,size = 15)
plt.yticks(size = 15)
plt.title("attractions_scc_most_visited",size = 20)
plt.show()