# Welcome to pystardog

Press the Restart & Run All button to run all the cells in this notebook and view the output.

This Notebook uses `pyStardog` to connect to a Stardog Platform database server.  

The database it uses is similar to the **Insurance Risk and Underwriting Demo** in the Stardog Knowledge Kits on Stardog Cloud


We will be using the graph data fed by multiple data sources to access crime statistics.

## Imports

In [None]:
import io
import stardog
import pandas as pd
import seaborn as sns
from configparser import ConfigParser

## Configuration required to connect to Stardog

In [None]:
# Get credentials from file
# file contains DEFAULT section plus override sections
config_section = 'doghouse'
parser = ConfigParser()
_ = parser.read('../CREDENTIALS.config')

url = parser.get(config_section, 'url')
user = parser.get(config_section, 'user')
password = parser.get(config_section, 'password')
db = 'voicebox-training-healthcare'
api_endpoint = 'query'

connection_details = {
    'endpoint': url,
    'username': user,
    'password': password
}


In [None]:
connection_details

## Connect to the Stardog database

In [None]:
conn = stardog.Connection(db, **connection_details)

conn.begin()

## OK - Let's materialize a virtual graph!

In [None]:
vg_name = "accenturetictest__data__antifraud"
dataset_graph_name = "urn:antifraud:materialized"

conn.update(f'ADD <virtual://{vg_name}> TO <{dataset_graph_name}>')
conn.commit()


## Query the database

This query returns the crime stats for Washington DC by crime type and zip code.

In [None]:
query = """
PREFIX sqs: <tag:stardog:api:sqs:>
PREFIX : <http://api.stardog.com/>

select * { ?statIRI  a :Crime_Stats;
                   :Crime_Type ?offense;
                   :Crime_Count ?crimeCount;
                   :Crime_Zip ?zipCode;
                   :Occurred_In ?zipCodeIri.
          ?zipCodeIri a :Zip_Codes.
          } 
"""

csv_results = conn.select(query, content_type='text/csv')
df = pd.read_csv(io.BytesIO(csv_results))

## Preview Crime Data

In [None]:
df.head()

## Plot total crime stats in Washington DC

In [None]:
#Plotting a bar chart
import matplotlib.pyplot as plt
plt.figure(figsize=[5,4])
#df['offense'].value_counts().plot.barh()
df.groupby('offense').crimeCount.sum().sort_values(ascending=False).plot(kind="bar")
sns.set(style="darkgrid")
plt.show()

In [None]:
## Plot total crime stats in Washington DC by Zip Code

In [None]:
#Plotting a bar chart
import matplotlib.pyplot as plt
plt.figure(figsize=[6,5])
#df['offense'].value_counts().plot.barh()
df.plot.scatter(x='zipCode',
                      y='offense')
sns.set(style="darkgrid")
plt.show()

### Clean up the connection

Normally you would use a `with statement` similar to line 3.

In [None]:
conn.__exit__()