### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [None]:
# In this notebook, you will learn to call OCI Language from a Data Science Notebook, and how to do analysis on the data.
# More specifically, in this exercise you will ingest a set of hotel reviews, use OCI Language to identify the named entities in those reviews, and identify the most popular entities mentioned.
# You will not need to write any code from scratch, but we expect you to read the code and understand what we are doing. You can click Shift+Enter on each cell to run the code in the cell.

In [None]:
# Installing required libraries
!pip install pandas
!pip install ipymarkup
!pip install matplotlib

In [None]:
import oci
import pandas as pd
from ipymarkup import show_ascii_markup
from ipymarkup import show_box_markup
from matplotlib import pyplot as plt

In [79]:
Data = pd.read_csv('Data.csv')

In [78]:
Data.to_csv('e.csv', index=False)

In [74]:
Data.loc[Data.Reviews == "1000", 'Reviews'] = "6"

In [80]:
#View 5 Data Points
Data.head(5)

Unnamed: 0,id,dateAdded,dateUpdated,address,categories,primaryCategories,city,country,keys,latitude,...,reviews.dateSeen,reviews.rating,reviews.sourceURLs,Reviews,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sourceURLs,websites
0,AWV8VsCtRxPSIh2RyTvS,2018-08-27T17:01:16Z,2019-05-20T21:40:08Z,610 Poydras St,"Building,Hotels and motels,Hotel",Accommodation & Food Services,New Orleans,US,us/la/neworleans/610poydrasst/-946012914,29.949125,...,"2018-11-06T00:00:00Z,2018-08-26T00:00:00Z",3,https://www.tripadvisor.com/Hotel_Review-g6086...,The water is very hot and there's no cold wate...,"Very hot water, bad food",Honolulu,Hawaii,Stacy D,https://www.tripadvisor.com/Hotel_Review-g6086...,"http://www.whitneyhotel.com/,http://www.whitne..."
1,AWV8VsCtRxPSIh2RyTvS,2018-08-27T17:01:16Z,2019-05-20T21:40:08Z,610 Poydras St,"Building,Hotels and motels,Hotel",Accommodation & Food Services,New Orleans,US,us/la/neworleans/610poydrasst/-946012914,29.949125,...,2018-11-06T00:00:00Z,5,https://www.tripadvisor.com/Hotel_Review-g6086...,Great staff and rooms. Housekeeping was always...,Excellent hotel,New York City,NewYork,brand0nstark,https://www.tripadvisor.com/Hotel_Review-g6086...,"http://www.whitneyhotel.com/,http://www.whitne..."
2,AWV8VsCtRxPSIh2RyTvS,2018-08-27T17:01:16Z,2019-05-20T21:40:08Z,610 Poydras St,"Building,Hotels and motels,Hotel",Accommodation & Food Services,New Orleans,US,us/la/neworleans/610poydrasst/-946012914,29.949125,...,"2018-11-06T00:00:00Z,2018-08-26T00:00:00Z",2,https://www.tripadvisor.com/Hotel_Review-g6086...,"This Hotel, formerly a prestigious bank, may b...",Historic but uncomfortable,Arlington,Texas,RobertKestenbaum,https://www.tripadvisor.com/Hotel_Review-g6086...,"http://www.whitneyhotel.com/,http://www.whitne..."
3,AWV8VsCtRxPSIh2RyTvS,2018-08-27T17:01:16Z,2019-05-20T21:40:08Z,610 Poydras St,"Building,Hotels and motels,Hotel",Accommodation & Food Services,New Orleans,US,us/la/neworleans/610poydrasst/-946012914,29.949125,...,"2018-11-06T00:00:00Z,2018-08-26T00:00:00Z",4,https://www.tripadvisor.com/Hotel_Review-g6086...,Very accommodating staff. Competitive pricing ...,Accomations,Des Moines,Washington,donp638,https://www.tripadvisor.com/Hotel_Review-g6086...,"http://www.whitneyhotel.com/,http://www.whitne..."
4,AWV8VsCtRxPSIh2RyTvS,2018-08-27T17:01:16Z,2019-05-20T21:40:08Z,610 Poydras St,"Building,Hotels and motels,Hotel",Accommodation & Food Services,New Orleans,US,us/la/neworleans/610poydrasst/-946012914,29.949125,...,"2018-11-06T00:00:00Z,2018-08-26T00:00:00Z",4,https://www.tripadvisor.com/Hotel_Review-g6086...,"Room was much larger than I expected, and wate...",Great Room at Great Location,Philadelphia,Pennsylvania,B1962UTvictoriaa,https://www.tripadvisor.com/Hotel_Review-g6086...,"http://www.whitneyhotel.com/,http://www.whitne..."


In [81]:
# Create Language service client with user config default values.
ai_client = oci.ai_language.AIServiceLanguageClient(oci.config.from_file())

In [82]:
#Detect Entities
result=[]
for data in Data.Reviews:
    detect_language_entities_details = oci.ai_language.models.DetectLanguageEntitiesDetails(text=data)
    output = ai_client.detect_language_entities(detect_language_entities_details)
    result += [output.data]

In [83]:
#Set the variable below to true if you only want PII
show_PII = False

In [84]:
#View the sentences with the detected Named Entites.
# no_of_sentences = len(result)
no_of_sentences = 10
DataTemp=Data
for res in range(no_of_sentences):
    spans=[]
    show_sen=False
    if show_PII:
        for i in result[res].entities:
            if i.is_pii:
                show_sen=True
                spans += [(i.offset, i.offset+i.length, i.type+"(PII)")]
                temp_str = Data.Reviews[res][i.offset:i.offset+i.length]
                DataTemp.Reviews[res] = DataTemp.Reviews[res].replace(temp_str, "*" * i.length)
    else:
        show_sen=True
        for i in result[res].entities:
            if i.is_pii:
                spans += [(i.offset, i.offset+i.length, i.type+"(PII)")]
            else:
                spans += [(i.offset, i.offset+i.length, i.type)]
    if show_sen:
        if show_PII:
            show_box_markup(DataTemp.Reviews[res], spans)
        else:
            show_box_markup(Data.Reviews[res], spans)
        print()































In [85]:
# build out a data frame of all the named entities and their types
no_of_sentences = len(result)
# no_of_sentences = 1
named_entities=[]
for res in range(no_of_sentences):
    spans=[]
    for i in result[res].entities:
        temp_named_entity = (i.text, i.type)
        named_entities.append(temp_named_entity)
entity_frame = pd.DataFrame(named_entities, 
                            columns=['Entity Name', 'Entity Type'])

In [86]:
# transform and aggregate the data frame to find the top occuring entities and types.
top_entities = (entity_frame.groupby(by=['Entity Name', 'Entity Type'])
                           .size()
                           .sort_values(ascending=False)
                           .reset_index().rename(columns={0 : 'Frequency'}))

In [None]:
no_of_top_entities=15
top_entities.iloc[:no_of_top_entities,:]

In [None]:
# transform and aggregate the data frame to find the top occuring entities types.
top_entities = (entity_frame.groupby(by=['Entity Type'])
                           .size()
                           .sort_values(ascending=False)
                           .reset_index().rename(columns={0 : 'Frequency'}))

In [None]:
no_of_top_entities = top_entities['Entity Type'].size
top_entities.iloc[:no_of_top_entities,:]

In [None]:
# Horizontal Bar Plot
fig = plt.figure(figsize =(15, 7))
plt.bar(top_entities['Entity Type'], top_entities['Frequency'])
# Show Plot
plt.show()