# Create Recommendation Report for a Metadata Collection
### Notebook Goals
* evaluate metadata for concepts and xpaths
* Create data about the collection's concepts and xpaths
* create collection reports on data in Excel and Google Sheets


In [None]:
# directory creation
import os
# manipulating table data
import pandas as pd
# create a dropdown
from ipywidgets import *
# display widget
from IPython.display import display

# upload records for evaluation, analyze and create reports
import MDeval as md

#### Describe the metadata. 
* What organization created the records? (Organization)
* What collection are the records from? (Collection)
* What dialect are the records written in? (Dialect)

First we need to set some variables that identify where the metadata is and create a place for the resulting analysis and reports. We'll step through the process with  

In [None]:
# variables for function arguments, fill these out
Organization = 'LTER'
Collection = 'MILES'
Dialect = 'EML'

# variable created from other variables, defining where to put the metadata
MetadataLocation = './metadata/' + Organization + '/' + Collection
# create directories
os.makedirs('./data/' + Organization, exist_ok=True)
os.makedirs('./reports/' + Organization, exist_ok=True)

#### Evaluate metadata for element content and concept content 
* Upload metadata to Metadata Evaluation Web Service
* Read returned element content of records into a dataframe
* Read returned concept content of records into a dataframe

In [None]:
md.XMLeval(MetadataLocation, Organization, Collection, Dialect)

#### Convert the conceptual CSV response of XMLeval into a dataframe. 

The concept evaluated table contains a row for each concept identified by the Metadata Evaluation Webservice in the collection. 
The row contains: 
* The Collection the record is from 
* The Dialect the record is written in
* The record the concept was in
* The concept name 
* The xpath the concept was found at
* The content at that location in the document.

In [None]:
# assign csv filepath of concept results to a variable
ConceptEvaluatedCSV = os.path.join(
        './data/', Organization, Collection +
        '_' + Dialect + "_ConceptEvaluated.csv")

# read csv into pandas dataframe
ConceptDF = pd.read_csv(ConceptEvaluatedCSV, quotechar='"')

# show dataframe
ConceptDF

#### Thats a lot of rows. Let's make the dataframe display better

In [None]:
# pd.describe_option() will explain these options and more
# limit number of rows displayed
pd.set_option('display.max_rows', 20)
# widen the columns so the content is easier to see
pd.set_option('display.max_colwidth', 115)
# Display the dataframe again
ConceptDF

#### Convert the xpath CSV response of XMLeval into a dataframe. 

The xpath evaluated table contains a row for each xpath with text content identified by the Metadata Evaluation Webservice in the collection. 
The row contains: 
* The Collection the record is from 
* The record the xpath was in
* The xpath the text was found at
* The content at that location in the document.

In [None]:
# assign csv filepath of concept results to a variable
ElementEvaluatedCSV = os.path.join(
        './data/', Organization, Collection +
        '_' + Dialect + "_ElementEvaluated.csv")

# read csv into dataframe
ElementDF = pd.read_csv(ElementEvaluatedCSV, quotechar='"')

# show dataframe
ElementDF

In [None]:
# import recTags
RecommendationsDF = pd.read_csv('./RecTag.csv')
# select recommendation

# create a list of recommendations
RecommendationChoices = RecommendationsDF['Recommendation'].tolist()
# remove list items that are empty
RecommendationChoices = [x for x in RecommendationChoices if str(x) != 'nan']

# create recommendation choice function to interact with widget and recommendation list
def RecChoices(Rec):
    global Recommendation
    Recommendation = (RecommendationsDF[RecommendationsDF['Recommendation'] == Rec]).values.tolist()[0]
    Recommendation = [x for x in Recommendation if str(x) != 'nan']
    del Recommendation[0]
    return Recommendation

# recommendation selector dropdown    
w=interactive(RecChoices, Rec=RecommendationChoices) 
#get list to use
display(w)

#### Look at the concepts that are part of the selected recommendation

In [None]:
Recommendation

#### Limit the rows to just concepts in the selected recommendation

In [None]:
# remove rows that do not contain a Concept name that is contained in Recommendation
RecommendationConceptsDF = (
    ConceptDF.loc[ConceptDF['Concept'].isin(
        Recommendation
    )])
# Define a location to save the results to.
RecommendationConceptEvaluatedCSV = (
    './data/' + Organization + '/' + Collection +
    '_' + Dialect + '_RecommendationEvaluated.csv'
)
# Save the result to a CSV to add to the report
RecommendationConceptsDF.to_csv(
    RecommendationConceptEvaluatedCSV, index=False
)
# Display the resulting dataframe
RecommendationConceptsDF


Analyze the evaluated metadata. Create a Google Sheets report on the collection containing the occurrence, counts, and content of Schema.org concepts and absolute content of the elements and attributes in the records


#### Concept Occurrence function
* The first row is the number of records. Use the *RecordCount* column
* Rows are Concepts in the Recommendation
* Columns are ConceptCount, RecordCount, AverageOccurrencePerRecord, CollectionOccurrence%

In [None]:
# create concept occurrence
# Define location to save to
ConceptOccurrenceCSV = (
    './data/' + Organization + '/' +
    Collection + '_' + Dialect + '_ConceptOccurrence.csv'
)
# run Concept occurrence function
ConceptOccurrenceDF = md.conceptOccurrence(
    RecommendationConceptsDF, Organization,
    Collection, Dialect, ConceptOccurrenceCSV
)
# read csv into a dataframe
ConceptOccurrenceDF = pd.read_csv(ConceptOccurrenceCSV, index_col=0)
# change order of rows to be meaningful for recommendation
ConceptOccurrenceDF = ConceptOccurrenceDF.reindex(
    ['Number of Records'] + Recommendation
)
# fill blank spaces with the properly configured value of 0

values = {
    'Collection': Organization+'_'+Collection, 'ConceptCount': 0, 'RecordCount': 0,
    'AverageOccurrencePerRecord': 0.00, 'CollectionOccurrence%': 0.00
}
ConceptOccurrenceDF = ConceptOccurrenceDF.fillna(value=values)
# write over the previous csv
ConceptOccurrenceDF.to_csv(ConceptOccurrenceCSV, mode='w')
ConceptOccurrenceDF

#### Concept Counts Analysis
* Rows are records
* Columns are Concepts
* Values are the counts of each concept for the record 

In [None]:
# Define location to save to
ConceptCountsCSV = (
    './data/' + Organization + '/' +
    Collection + '_' + Dialect + '_ConceptCounts.csv'
)
# Concept counts MDeval function
occurrenceMatrix = md.conceptCounts(
    RecommendationConceptsDF, Organization,
    Collection, Dialect, ConceptCountsCSV
)
# order columns to reflect recommendation order

occurrenceMatrix = (occurrenceMatrix[
    ['Collection', 'Record'] + Recommendation])
# write results to csv
occurrenceMatrix.to_csv(ConceptCountsCSV, mode='w', index=False)

#### Xpath Occurrence Analysis

* The first row is the number of records. Use the *RecordCount* column
* Rows are Xpath in the Recommendation
* Columns are XpathCount, RecordCount, AverageOccurrencePerRecord, CollectionOccurrence%

In [None]:
# Define location to save to
XpathOccurrenceCSV = (
    './data/' + Organization + '/' + Collection +
    '_' + Dialect + '_ElementOccurrence.csv'
)
# function for 
md.xpathOccurrence(
    ElementDF, Organization,
    Collection, Dialect, XpathOccurrenceCSV
)

#### Xpath Counts Analysis
* Rows are records
* Columns are Xpaths
* Values are the counts of each Xpath for the record 

In [None]:
#define location to save to.
XpathCountsCSV = (
    './data/' + Organization + '/' +
    Collection + '_' + Dialect + '_ElementCounts.csv'
)
# run XpathCounts function
md.XpathCounts(
    ElementDF, Organization,
    Collection, Dialect, XpathCountsCSV
)

#### Create Collection Spreadsheet
* Write the results from the evaluations and analyses.

In [None]:
# location to save the resulting Excel spreadsheet
ReportLocation = (
    './reports/' + Organization + '/' + Organization +
    '_' + Collection + '_' + Dialect + '_Report.xlsx'
)
# Run spreadsheet function
md.collectionSpreadsheet(
    Organization, Collection, Dialect,
    RecommendationConceptEvaluatedCSV, ElementEvaluatedCSV,
    XpathOccurrenceCSV, XpathCountsCSV,
    ConceptOccurrenceCSV, ConceptCountsCSV, ReportLocation
)

#### Upload and convert to Google Sheets

In [None]:
# run WriteGoogleSheets function
md.WriteGoogleSheets(ReportLocation)


[Next Notebook: Create JSON-LD for Datasets Using the schema.org Vocabulary and Test the Results](./02.CreateJSON-LD.ipynb)