# Goal

The goal of this script is to extract a row from the `StakeholderDino_FRs.csv` spreadsheet, which relates to the functional requirements discussed as part of a single unstructured interview, and to represent the contents of that row pictorially. The true goal of which, is to allow the interviewees to confirm whether we have truly captured their requirements in a less complicated manner than requiring them to wrap their heads around the master spreadsheet format.

As always, start by importing modules, including the previously unused/lesser known [DataMapPlot](https://datamapplot.readthedocs.io/en/latest/index.html), which will be crucial for building an interactive mindmap-style plot. [Random](https://docs.python.org/3/library/random.html) allows for the generation of _pseudo_-random numbers.

## Potential issue

The outputted html is being reported (by Teams, Safari and Outlook) as containing malware. This is despite the package being [actively maintained on Github](https://github.com/TutteInstitute/datamapplot) and promoted through [matplotlib's third party package listings](https://matplotlib.org/thirdpartypackages/). Although its not impossible that it contains malware, it is unlikely.

## Issue resolved

Bypassing the issue by instead setting up a repository in Github which uses Pages to host the plots.

In [41]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datamapplot
import random #For random number 
import os

To use DataMapPlot, a set of 2D coordinates (such as a 2D numpy array) and a list of labels, one for each coordinate pair is required. As we wish to make a map where the coordinates form a circle, we shall try to employ polar coordinates as part of a for loop. The number of positions that need generating relates to the number of top-level 'attributes' in the [master spreadsheet](https://livemanchesterac.sharepoint.com/:x:/r/sites/UOM-RIT-RLP/Shared%20Documents/New%20RLP/Open,%20Reproducible%20and%20Responsible%20Research/ORR12%20Digital%20Notebooks%20%26%20Sample%20Inventories/02%20Definition/Functional%20Requirements/Stakeholders_FRs_Bens.xlsm?d=w9d0e97af6a054764832f14b4d0730ed5&csf=1&web=1&e=N3aDly), as depicted in the innermost circle of [Miro](https://miro.com/app/board/uXjVLdds0oo=/?share_link_id=490665046930). Hence, the most sustainable way to produce the plot requires reading in data from the spreadsheet.

In [42]:
# Define the absolute paths to the CSV files
base_path = './data'
attrib_df_path = os.path.join(base_path, 'AttributeDefinition.csv')
entries_df_path = os.path.join(base_path, 'DefaultEntries.csv')

# Read the CSV files
attrib_df = pd.read_csv(attrib_df_path)
entries_df = pd.read_csv(entries_df_path)

# Display the entries DataFrame
entries_df

Unnamed: 0,TL-01,G-01,U-01,U-02,U-03,CF-01,CF-02,CF-04,CF-05,CF-06,...,C-02,EF-01,EF-02,EF-03,EF-05,EF-06,IT-01,IT-02,IT-03,IT-04
0,Benchling Notebook,AGPL 3.0,No,Amharic,Online documentation,Annotation,All formats,Complete content in document format,Define own templates,Advanced/conditional search,...,ADA,Asset management,Create own plug-ins,Autoupload/folder watch,Calendar,Cross-project workflows,Command line input,No,Browser based,Cloud of own choice
1,Genemod,Apache 2.0,Unknown,Bengali,"User training (online, on site)",Barcode Scanner,Audio formats,Complete content in machine readable format,Import from internet sources,BLAST Search,...,ASTM,Freezer Management,In-house plugins,Business Logics,,Export possible,Java API,Unknown,Local client,Local
2,Labfolder (Labforward),Closed Source,Yes,Dutch,Support by provider (Consulting),Browser forms,Database formats,Direct publication option,Import of own templates,Database queries,...,CROMERRR,Instrument management,On request,Data analysis,Taskboard,Graphical,,Yes,Mobile Application,Provider's Cloud
3,LabID,MIT,,Egyptian Spoken Arabic,,Chemical editor/sketching,Direct only (CF-01),Formats suitable for long term archiving,Microtitre plate templates,File/data hierarchy,...,FDA CFR 21 Part 11,Inventory (devices),Other vendor products,Device control,Task management,Import possible,ODBC,,Responsive Design,
4,Labstep,Mozilla Public Licence 2.0,,English,,Dictation function,Document formats,Formats suitable for publication,Subject specific templates,Filtering,...,FERPA,LIMS connectivity,Widgets,,Unknown,No,Other API,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64,Sapio Platform (Triple Play),,,,,,,,,,...,,,SnapGene (Dotmatics),,,,,,,
65,SciCord,,,,,,,,,,...,,,Teams,,,,,,,
66,Sciformation,,,,,,,,,,...,,,VisioNize sense (Eppendorf ),,,,,,,
67,Scilligence,,,,,,,,,,...,,,Winchat,,,,,,,


Now to create an array of 2D coordinates corresponding to the number of attributes in the sheet, now contained in the dataframe labelled `attrib_df`

In [43]:
#Function for polar to cartesian conversion.
def pol2cart(rho, phi):
    x = rho * np.cos(phi)
    y = rho * np.sin(phi)
    return(x, y)
    
#Take first column, labelled 'ID'
ID_col = attrib_df.iloc[:,0]
rho=1 #Fix radius
arr = np.empty((len(ID_col),2)) #Create empty array
arr

for idx, i in enumerate(attrib_df.iloc[:,0]):
    phi = idx/(len(ID_col)) * (2*np.pi)
    #print(phi)
    arr[idx] = pol2cart(rho,phi)
arr

array([[ 1.00000000e+00,  0.00000000e+00],
       [ 9.59492974e-01,  2.81732557e-01],
       [ 8.41253533e-01,  5.40640817e-01],
       [ 6.54860734e-01,  7.55749574e-01],
       [ 4.15415013e-01,  9.09631995e-01],
       [ 1.42314838e-01,  9.89821442e-01],
       [-1.42314838e-01,  9.89821442e-01],
       [-4.15415013e-01,  9.09631995e-01],
       [-6.54860734e-01,  7.55749574e-01],
       [-8.41253533e-01,  5.40640817e-01],
       [-9.59492974e-01,  2.81732557e-01],
       [-1.00000000e+00,  1.22464680e-16],
       [-9.59492974e-01, -2.81732557e-01],
       [-8.41253533e-01, -5.40640817e-01],
       [-6.54860734e-01, -7.55749574e-01],
       [-4.15415013e-01, -9.09631995e-01],
       [-1.42314838e-01, -9.89821442e-01],
       [ 1.42314838e-01, -9.89821442e-01],
       [ 4.15415013e-01, -9.09631995e-01],
       [ 6.54860734e-01, -7.55749574e-01],
       [ 8.41253533e-01, -5.40640817e-01],
       [ 9.59492974e-01, -2.81732557e-01]])

Find the distance between two points to be able to a smaller radius, r_2, which can be used to set the random boundary for the points to be distributed within a circular region around the level one centrepoint (i.e. the coordinates already given in the array).

In [44]:
diff = arr[0]-arr[1]
print(diff,diff[0],diff[1])
dist = np.sqrt(diff[0]**2+diff[1]**2)
dist

[ 0.04050703 -0.28173256] 0.04050702638550263 -0.28173255684142967


0.28462967654657023

With the distance, can set the limit of r_2.

In [45]:
new_arr = []
new_labels = []

for idx1, column in enumerate(entries_df):
    if column != 'TL-01':
        coord_origin = arr[idx1]
        col = entries_df[str(column)]
        col = col.dropna()
        entry_count =len(col)
        #print(col)
        for idx2, entry in enumerate(col):
            #random.random returns the next random floating-point number in the range 0.0 <= X < 1.0
            #coord_final = coord_origin + pol2cart(random.random()*(dist/2),random.random()*(2*np.pi))
            coord_final = coord_origin + pol2cart(dist/3,((2*np.pi)*(idx2/entry_count)))
            new_arr.append(coord_final)
            new_labels.append(entry)
            #print(series.dropna())

#new_labels
corr_arr = np.array(new_arr)
corr_arr


array([[ 1.05436953e+00,  2.81732557e-01],
       [ 1.00693125e+00,  3.63898067e-01],
       [ 9.12054694e-01,  3.63898067e-01],
       [ 8.64616415e-01,  2.81732557e-01],
       [ 9.12054694e-01,  1.99567047e-01],
       [ 1.00693125e+00,  1.99567047e-01],
       [ 9.36130092e-01,  5.40640817e-01],
       [ 7.93815253e-01,  6.22806328e-01],
       [ 7.93815253e-01,  4.58475307e-01],
       [ 7.49737293e-01,  7.55749574e-01],
       [ 7.48022750e-01,  7.73705081e-01],
       [ 7.42941089e-01,  7.91011629e-01],
       [ 7.34675974e-01,  8.07043715e-01],
       [ 7.23526129e-01,  8.21221896e-01],
       [ 7.09894537e-01,  8.33033738e-01],
       [ 6.94273881e-01,  8.42052328e-01],
       [ 6.77228730e-01,  8.47951712e-01],
       [ 6.59375142e-01,  8.50518670e-01],
       [ 6.41358392e-01,  8.49660427e-01],
       [ 6.23829651e-01,  8.45408000e-01],
       [ 6.07422455e-01,  8.37915085e-01],
       [ 5.92729801e-01,  8.27452493e-01],
       [ 5.80282721e-01,  8.14398372e-01],
       [ 5.

With the coordinates listed, the attributes can be read in as labels, and plotted with DataMapPlot.

In [46]:
labels = attrib_df.iloc[:,1]
plot = datamapplot.create_interactive_plot(corr_arr, new_labels)
plot

## From plot to page

Now, we need to add so-called ['Front matter'](https://jekyllrb.com/docs/front-matter/) to the plot to ensure that the plot appears as as single page in our Github Pages. We'll use a function do this

In [47]:
def front_matter(figure):
    # Define the text to be added at the top of the HTML file
    header_text = """
---
layout: page
title: Plot
permalink: plots
---
"""

    # Save the figure's HTML representation to a file
    figure.save('plots.html')

    # Read the saved HTML file
    with open('plots.html', 'r') as f:
        figure_html = f.read()

    # Combine the header text with the figure's HTML
    full_html = header_text + figure_html

    # Save the combined HTML to the file
    with open('plots.html', 'w') as f:
        f.write(full_html)

So we can test the function below

In [48]:
front_matter(plot)