# Multifunctional Graph Plotter
### Ewan Miles - 06/05/2020

**This code is entirely open-source and thus editable by any user.**

This program is designed to automatically plot a graph by unpacking data from a table in the same file location (i.e. if this program is under _C:/Desktop_, the datatable must be under _C:/Desktop_ also). 

Note that the tables should be in **_.csv_** format, as this allows for data unpacking. Furthermore, make sure the data is in either float (**decimal point**) or integer format.

Types of graph available:
* **[2D Cartesian (x-y)](#cartesian), with fits**:

    * Weighted straight line
    * Polynomial (reduced $\chi^2$)
    
* **[Box and Whisker](#boxplot)**

# <a id="cartesian">2D Cartesian (x-y) graphs</a>

To plot cartesian graphs, the program requires certain inputs from the user, including
* the **filename**;
* the **formatting** of each series, with a name for each series in the **legend**;
* the **axis scale type** for $x$ and $y$;
* the **titles** for each axis and the graph.

Please arrange the data so that columns follow as such: $x\space data, x\space uncertainty, y\space data, y\space uncertainty$. Four columns as such constitutes a **_series_**. Place series next to each other in the adjacent four columns. Below is an example template of how to format your tables:

$$\mathbf{Series 1\kern 25em Series 2}$$
$$x\space data \kern 3em x\space uncertainty \kern 3em y\space data \kern 3em y\space uncertainty \kern 2em | \kern 2em x\space data \kern 3em x\space uncertainty \kern 3em y\space data \kern 3em y\space uncertainty$$

If you do not wish to include horizontal error bars, for the purpose of plotting using this program, set your $x \space uncertainties$ equal to 0. Set the $y \space uncertainties$ equal to 0 if you do not wish to include vertical error bars. If the data has no uncertainty, leave both as 0 and learn to use a proper experimental method!

Before the program proceeds with plotting the graph, it will print arrays of the unpacked tables to test whether it is the data you desire. This will happen in a separate cell labelled **Print data to check**; if the data is correct you can run the following cells to plot the graph. It _will not print all of the data_ if it is in more than 5 series in order to save space in the console and keep the code running smoothly; it will instead print the first two and last two series.

In [None]:
### IMPORTING MODULES, DEFINING FUNCTIONS ###

%matplotlib notebook
#Allowing interactive plots

import numpy as np                 #Maths module
import matplotlib.pyplot as plt    #Plots graphs
import ipywidgets as wdg           #Interactive sliders, radio buttons, etc
import scipy.stats as stats        #Gaussian fits, etc 

def dataprint(var,start,end):
    """
    Function which iterates through datasets to print out series of data (used to check against data), inputs:
    - var: Unpacked dataset (e.g. [5,7,5,3,2],[5,7,89,8,87])
    - start: Startpoint for iterating through data (e.g. first series number)
    - end: Endpoint for iterating through data (e.g. last series number)
    """
    for n in range(start,end):
        print(">>> Series {0}:".format(n+1))
        for j in range(4*n,4*(n+1)):
            print(var[j])
            
def fullset(var):
    """
    Function which iterates through data series to create one full dataset including all datapoints
    Used for plotting fitted curves/lines, does not affect original data array/matrix, input:
    - var: Dataset already unpacked into array/matrix (e.g.[5,6,7,8],[0.1,0.1,0.1,0.2],...) 
    """
    #Construct empty datasets to fill over iteration
    xdata = []
    ydata = []
    xerr = []
    yerr = []
    k = 0               #Iteration variable
    while k < len(var):
        xdata.append(var[k])    #Append x dataset
        k += 1
        xerr.append(var[k])     #Append x-error dataset
        k += 1
        ydata.append(var[k])    #Append y dataset
        k += 1
        yerr.append(var[k])     #Append y-error dataset
        k += 1
    return xdata, ydata, xerr, yerr

def residuals(degree, p, dy):
    """
    Function that calculates the residuals, squared residuals and sq residual sum for a polynomial
    fit to data, which can be used to calculate the reduced chi^2 value, inputs:
    - degree: Order of fitted polynomial (e.g. quadratic degree = 2)
    - p: Coefficients of fitted polynomial found using np.polyfit
    - dy: Arrays of y-uncertainty data ONLY from the full dataset
    NOTE: It will attempt to unpack the datafile as defined at the top of this notebook, check those
    variables (e.g. data, line, seriesno, etc.) have not been redefined as something else
    Outputs numpy array of [residuals, square residuals, sq residual sum]
    """
    #Gather separate arrays of all xpoints and ypoints
    xunpack = np.loadtxt(data, delimiter=",", skiprows=line, usecols=(list(i for i in range(0,4*seriesno,4))), unpack=True, encoding="UTF-8")
    yunpack = np.loadtxt(data, delimiter=",", skiprows=line, usecols=(list(i for i in range((0*seriesno+2),4*seriesno,4))), unpack=True, encoding="UTF-8")

    ### THIS SECTION OF CODE COVERS THE POSSIBILITY THAT THE x VALUES IN EACH COLUMN MAY NOT BE THE SAME
    #Gather distinct values of x from array (columns may not be the same)
    xtrimmed = np.unique(xunpack)

    #Empty array for mean y points
    meanypts = np.array([])

    for i in xtrimmed:
        xpos = np.where(xunpack == i)           #Find occurrences of distinct x in full x points array
        ybar = np.mean(yunpack[xpos])           #Mean the corresponding y points in y array
        meanypts = np.append(meanypts, ybar)

    powers = np.linspace(degree,0,degree+1)

    ylinepts = []
    for i in xtrimmed:
        term = 0
        for j,n in zip(powers,p):
            term += n*(i**j)
        ylinepts.append(term)

    residuals = []
    for y,ybar,unc in zip(ylinepts,meanypts,dy):
        residuals.append((y-ybar)/unc)
    rsq = np.square(residuals)
    rsum = np.sum(rsq)
    
    return np.array([residuals,rsq,rsum])

In [None]:
### SEARCH FOR FILE BASED ON USER INPUT ###

#Decode csv for and define variable to unpack later
encoded = True                 #Loop for checking if filename is correct
while encoded == True:
    csv = str(input("What is the name of the data file? It must be a .csv file. Do not include the .csv extension, or 'quotes': "))
    data = "{0}.csv".format(csv)
    try:
        datatable = open(data, encoding="UTF-8-sig")   #Try opening the datafile as variable
        encoded = False                                #If successful break loop
    except FileNotFoundError:                          #If unsuccessful ask user to retype file name
        print("Sorry, the file could not be found. Check your input.")
        continue

firstline = datatable.readlines(1)   #Unpack first line of data

#Loop to count each instance of comma delimiter, add 1 series to count for each 4 instances of delimiter
for i in firstline:
    count = 1
    while len(i) != 0:
        loc = i.find(",")   #Cycle through and count each instance of delimiter
        if loc == -1:
            break           #Break from loop if delimiter not present
        count += 1
        i = i[loc+1:]       #Slice previous delimiter from string, re-iterate
        continue
    seriesno = int(count/4)

#Loop to find first row with floats, which is starting row of data
line = 1
for i in datatable.readlines():
    comma = i.find(",")
    i = i[:comma]                 #Slice to first item in line
    try:
        float(i)                  #Attempt to make float, if not, row is not data
        break
    except ValueError:
        line += 1

datatable = open(data, encoding="UTF-8-sig")   #Open the datafile again as reading lines clears variable

#Unpacking data and creating variables for plotting the graph
var = [i for i in range(4*seriesno)]
var[:] = np.loadtxt(datatable, delimiter=",", skiprows = line, unpack=True, encoding="UTF-8")

## Print data to check

The cell below will give you the option to print the unpacked data to check whether the operation has worked successfully. It is not required that you run the cell, it is entirely optional. Reasons for not checking include very large datasets, or no way of checking against the csv file.

As printing 200 series of data is a waste of time and space, the cell will print four series only. If the dataset is four series or less, it will print all series, but if it is five or more, it will print **the first two series and the last two series only**. It is useful to check against smaller datasets, to make sure no points have been missed by the functions above.

If your data is not being printed out correctly, check that the csv file fits all specifications for the unpacking in the text cell at the top.

In [None]:
#Print dataset for user to check it is correct
if seriesno < 5:
    dataprint(var,0,seriesno)              #Only print full set if 4 series or fewer

else:
    dataprint(var,0,2)
    print("\n.......\n")
    dataprint(var,seriesno-2,seriesno)     #If more than 4 series, print first two and last two series


## Labels and gridlines

The graph and its axes will need labels; the cell below offers inputs for you to add them. It also offers an option to include gridlines on the graph, or have a blank background otherwise.

In [None]:
#Title, x axis and y axis
xlabel = str(input("What would you like to label the x axis?: "))
ylabel = str(input("What would you like to label the y axis?: "))
title = str(input("What would you like to title the graph?: "))

#Radio button widget choices for gridlines
gridradio = wdg.RadioButtons(
    options=["Yes","No"],           #Options for gridlines
    description="Gridlines?",
    disabled=False
)

gridradio        #Call the selection widget

## Scale types

For certain types of graphs, it can be advantageous to plot a logarithmic scale or other types of scale, which are available within `matplotlib`. These are presented in radio button choices below; they are declared as variables later on in the script for plotting the graph.

In [None]:
#Scale types for each axis
print("There are multiple types of scale available for axes in matplotlib. These include log, linear, symlog and logit.")
print("To learn more about scale types, reading is available at https://matplotlib.org/gallery/pyplots/pyplot_scales.html#sphx-glr-gallery-pyplots-pyplot-scales-py")

#Radio button widget choices for x axis
xaxisradio = wdg.RadioButtons(
    options=["linear","log","symlog","logit"],   #Options for scale type
    value="linear",                              #Default selected scale type
    description="$x$ Axis Scale:",
    disabled=False
)

xaxisradio        #Call the selection widget

In [None]:
#Radio button widget choices for y axis
xscale = xaxisradio.value
yaxisradio = wdg.RadioButtons(
    options=["linear","log","symlog","logit"],    #Options for scale type
    value="linear",                               #Default selected scale type
    description="$y$ Axis Scale:",
    disabled=False
)

yaxisradio        #Call the selection widget

## Graph Markers

Although purely stylistic, `matplotlib` offer different shapes with which you can cast your data. There is a link in the code below to a website which shows most of the markers `matplotlib` have to offer. The string input below is the way to choose the marker. If you want the data to be joined point to point, add "-" after the shape selection in the string (e.g. joined up with stars would be "*-")

In [None]:
#Defining variables for the scale types
xscale = xaxisradio.value
yscale = yaxisradio.value

#User input for dataseries shapes on plot
print("In which style would you like the data series to be cast? There are options you can find here: https://matplotlib.org/api/markers_api.html")
marker = str(input("Enter your shape of choice, as a default the data will be connected by lines between datapoints: "))

#User input for labels for each series
labels = []
j = 0                  #Iteration variable
while j < seriesno:
    lbl = str(input("Input the label for Series {0} of your data: ".format(j+1)))    #Name of series
    labels.append(lbl)
    j += 1

#Plotting the data
plt.figure()
if gridradio.value == "Yes":   #Check radio selection for gridlines, plot if wanted
    plt.grid(True)

#Loop through variable sets plotting each
index = 0
while index < seriesno:
    plt.errorbar(var[4*index], var[2+(4*index)], xerr=var[1+(4*index)], yerr=var[3+(4*index)], label=labels[index], fmt=marker)
    index += 1
    
plt.xlabel(xlabel)      #x axis label
plt.ylabel(ylabel)      #y axis label
plt.xscale(xscale)      #x scale type
plt.yscale(yscale)      #y scale type
plt.title(title)        #Title
plt.legend(loc="best");

#### SAVE CELL: The code below saves the figure with any name you give it.

**Do not run the cell if you do not wish to save the figure - the only escape to the input field is to kill the kernel.**

In [None]:
#Save figure to local directory with name chosen by user
name = str(input("Name the graph. End it with .pdf for a PDF, .png for a PNG, etc. The default is a PNG:"))
plt.savefig(name)
print("It has been saved to the file location (local directory) of this python program.")  

## Fitted straight line

The code below will plot your data once again, but with a weighted straight line fit. This includes calculating a weighted gradient and $y$-intercept. If there are multiple series of data, expands the x and y datasets by combining each series' elements and uses all of the points to construct the line.

The following equations are used in constructing a weighted line of best fit:
* The weight given to each point, $w_i$:
$$\normalsize{w_i=\frac{1}{(\Delta y_i)^2}}$$
* The equation for the weighted $m$:
$$\normalsize{m=\frac{\Sigma w_i\Sigma w_ix_iy_i-\Sigma w_ix_i\Sigma w_iy_i}{\delta}}$$
* The equation for the weighted $c$:
$$\normalsize{c=\frac{\Sigma w_ix_i^2\Sigma w_iy_i-\Sigma w_ix_i\Sigma w_ix_iy_i}{\delta}}$$
* The denominator, $\delta$ in both equations for $m$ and $c$:
$$\large{\delta=\Sigma w_i\Sigma w_ix_i^2-(\Sigma w_ix_i)^2}$$

Their uncertainties, $\Delta m$ and $\Delta c$ are given by:
$$\normalsize{\Delta m=\sqrt{\frac{\Sigma w_i}{\delta}}}$$

$$\normalsize{\Delta c=\sqrt{\frac{\Sigma x_i^2w_i}{\delta}}}$$

In [None]:
#Construct dataset including all points if multiple dataseries, for line of best fit to cover all data
if seriesno != 1:
    xdata, ydata, xerr, yerr = fullset(var)
else:
    xdata = var[0]
    xerr = var[1]
    ydata = var[2]
    yerr = var[3]
    
#Point weights, δ denominator, slope and intercept in order, as in equations above
w = 1/(np.square(yerr)) 
delta = (np.sum(w)*np.sum(w*np.square(xdata))) - np.square(np.sum(w*xdata))
m = ((np.sum(w)*np.sum(w*xdata*ydata))-(np.sum(w*xdata)*np.sum(w*ydata)))/delta
c = ((np.sum(w*np.square(xdata))*np.sum(w*ydata))-(np.sum(w*xdata)*np.sum(w*xdata*ydata)))/delta

#Uncertainties also as above
uncm = np.sqrt(np.sum(w)/delta)
uncc = np.sqrt(np.sum(np.square(xdata)*w)/delta)

#Create arrays for plotting the line
xpoints = np.linspace(np.min(xdata),np.max(xdata),200)
ypoints = (m*xpoints) + c

#Print slope and intercept with uncertainty for user (unrounded)
print("Value of the gradient (m):\n{0} ± {1} ".format(m,uncm))
print()
print("Value of the y-intercept (c):\n{0} ± {1}".format(c,uncc))

#Plotting the data
plt.figure()
if gridradio.value == "Yes":   #Check radio selection for gridlines, plot if wanted
    plt.grid(True)

#Loop through variable sets plotting each
index = 0
while index < seriesno:
    plt.errorbar(var[4*index], var[2+(4*index)], xerr=var[1+(4*index)], yerr=var[3+(4*index)], label=labels[index], fmt=marker)
    index += 1
    
plt.plot(xpoints,ypoints,"k-",label="Weighted Line Fit")    #Line of best fit
plt.xlabel(xlabel)      #x axis label
plt.ylabel(ylabel)      #y axis label
plt.xscale(xscale)      #x scale type
plt.yscale(yscale)      #y scale type
plt.title(title)        #Title
plt.legend(loc="best");

#### SAVE CELL

In [None]:
#Save figure to local directory with name chosen by user
name = str(input("Name the graph. End it with .pdf for a PDF, .png for a PNG, etc. The default is a PNG:"))
plt.savefig(name)
print("It has been saved to the file location (local directory) of this python program.")  

## Fitted polynomial curve

The code below will plot your data once again, but with a polynomial curve fit. The `numpy` module has this capability, and allows the user to fit a polynomial curve with a user-defined order using the function `np.polyfit`. It is able to determine appropriate coefficients from the data, along with a matrix of covariance. Again, if there are multiple dataseries, it will expand the dataset to include all series under one array of $x$, $x\space uncertainty$, $y$ and $y\space uncertainty$.

### Reduced $\chi^2$: Automatic or User-chosen?

Choosing an appropriate order of polynomial to fit the data is important, and can be measured using something called the reduced $\chi^2$. This should be around 1 for a good fit; **the lowest order polynomial** with $\chi^2$ approaching 1 should be used, otherwise the curve is often said to be 'overfitted'. 

This program offers a pseudo-automatic route. It will automatically fit multiple polynomials to the data and calculate a reduced $\chi^2$, running the user through 10 orders of the polynomial. 

It will output a graph mapping the reduced $\chi^2$ after execution. From there, you can make the decision to use a longer or shorter polynomial based on the change in $\chi^2$ as the order $n$ increases; if $\Delta y$ is accurate, $\chi^2$ should plateau around 1, but may not. In this case the user's best judgement is required.

To calculate the reduced $\chi^2$ value for a curve fit, the code will calculate the residuals of the datapoints from the curve, and from there use the following equations to reach a result:

* The residual of any point, $d_i$:
$$\normalsize d_i=y_{line}-y_i$$

$$(for\space all\space points\space y_i)$$

* The degrees of freedom within the fit, $\nu$, where $n_{coefficients}$ is the number of coefficients in the fitted polynomial:

$$\normalsize \nu = n_{points}-n_{coefficients}$$

* The reduced $\chi^2$:

$$\normalsize{\chi^2=\frac{\sum ({\frac{d_i}{\Delta y_i}})^2}{\nu}}$$

where $\Delta y_i$ is the experimental y uncertainty.

In [None]:
#Construct dataset including all points if multiple dataseries, for line of best fit to cover all data
if seriesno != 1:
    xdata, ydata, dxdata, dydata = fullset(var)
    #polyfit doesn't cooperate with array of np arrays, refilling lists of x, dx, y, dy
    x = []
    y = []
    dx = []
    dy = []
    
    #x variables
    for i in xdata:
        for j in np.nditer(i):
            x.append(float(j))
            
    #y variables
    for i in ydata:
        for j in np.nditer(i):
            y.append(float(j))
            
    #dx variables
    for i in dxdata:
        for j in np.nditer(i):
            dx.append(float(j))
            
    #dy variables
    for i in dydata:
        for j in np.nditer(i):
            dy.append(float(j))
else:
    x = var[0]
    dx = var[1]
    y = var[2]
    dy = var[3]

order = np.arange(1,10)   #Range of order values for looping
chiset = np.array([])

for degree in order:
    p, v = np.polyfit(x,y,degree,cov=True)       #Polynomial coefficients, Matrix of covariance
    resids, rsq, rsum = residuals(degree,p,dy)
    npoints = len(x)
    ncoeffs = len(p)
    dof = npoints - ncoeffs   #Degrees of freedom

    #Chi calculations, append each one to array for plotting
    chi = rsum/dof
    chiset = np.append(chiset, chi)

plt.figure()
plt.plot(order,chiset,"k-")
plt.xlabel("Polynomial order $n$")
plt.ylabel("$\chi^2$ value")
plt.title("Change in $\chi^2$ fit value as order $n$ of\nfitted polynomial increases")

#Return change in chi at each point
dchi = chiset[:-1] - chiset[1:]
print("Change in chi at each n: {0}".format(dchi))

#Suggest chi value based on change in chi
for c,n,d in zip(chiset,order,dchi):
    if d < 1:
        print("\nSuggested polynomial fit order: {0}".format(n))
        print("Chi^2 value for this polynomial: {0:0.3f}".format(c))
        print("Choose your desired order below.")
        break
        
#Allow user to input polynomial order for plot
user_order = wdg.BoundedIntText(
    value=1,
    min=1,
    max=9,
    step=1,
    description="Order $n$:",
    disabled=False
)

user_order

In [None]:
#Calculate polynomial fit based on user-chosen order
degree = user_order.value
p, v = np.polyfit(x,y,degree,cov=True)
poly = np.poly1d(p)                              #line function for plotting fitted curve

xpoints = np.linspace(x[0],x[len(x)-1],len(x))   #x value array
ypoints = poly(xpoints)                          #y value array using poly1d

res, rsq, rsum = residuals(degree,p,dy)

print("FOR POLYNOMIAL FIT OF ORDER: {0}".format(degree))
print()
    
#Printing coefficients with error
for j in range(np.size(p)):
    print("The coefficient of order x^", len(p)-j-1, " is ", p[j], " with error ", np.sqrt(np.diag(v))[j])
    print()
    
npoints = len(x)
ncoeffs = len(p)
dof = npoints - ncoeffs   #Degrees of freedom
print("The degrees of freedom:", dof)

chi = rsum/dof
print("Reduced chi^2:", chi)
print("\n\n")

#Plotting the data
plt.figure()
if gridradio.value == "Yes":   #Check radio selection for gridlines, plot if wanted
    plt.grid(True)

#Loop through variable sets plotting each
index = 0
while index < seriesno:
    plt.errorbar(var[4*index], var[2+(4*index)], xerr=var[1+(4*index)], yerr=var[3+(4*index)], label=labels[index], fmt=marker)
    index += 1
    
plt.plot(xpoints,ypoints,"k-",label="Polynomial fit, order {0}, $\chi^2 =$ {1:0.3f}".format(degree, chi))    #Line of best fit
plt.xlabel(xlabel)      #x axis label
plt.ylabel(ylabel)      #y axis label
plt.xscale(xscale)      #x scale type
plt.yscale(yscale)      #y scale type
plt.title(title)        #Title
plt.legend(loc="best");

#### SAVE CELL

In [None]:
#Save figure to local directory with name chosen by user
name = str(input("Name the polynomial fit graph. End it with .pdf for a PDF, .png for a PNG, etc. The default is a PNG: "))
plt.savefig(name)
print("It has been saved to the file location (local directory) of this python program.")

### The Residual Distribution

Plotting the distribution of the residuals calculated above and fitting a Gaussian curve allows the user to assess the fit of the polynomial. The mean $x_0$ should be around 0, while the standard deviation $\sigma$ should be approximately within the range of the uncertainty $\Delta y_i$ from the data used.

In [None]:
gx = np.linspace(-np.max(np.abs(res)),np.max(np.abs(res)),50)    #Range over which Gaussian is fitted
x0, sigma = stats.norm.fit(res)                                  #Mean and stdev for Gaussian
gaussian = stats.norm.pdf(gx,x0,sigma)                           #Gaussian curve

plt.figure()
# 15 bins, normalized:
plt.hist(res,bins=15,density=True,edgecolor='k')                 #Plot histogram
plt.plot(gx,gaussian,'r-', label="Gaussian Fit")                 #Plot Gaussian fit
plt.title("Distribution of residuals from the polynomial fit")
plt.xlabel("Residual Size (Corrected for random error)")
plt.ylabel("Normalised Occurences") 
plt.legend(loc="best");

print("\nMean residual value:", x0, "\nStandard Deviation:", sigma)

#### SAVE CELL

In [None]:
#Save figure to local directory with name chosen by user
name = str(input("Name the graph. End it with .pdf for a PDF, .png for a PNG, etc. The default is a PNG: "))
plt.savefig(name)
print("It has been saved to the file location (local directory) of this python program.")

# <a id="boxplot">Box and Whisker plots</a>

For a box plot, the program once again requires certain inputs from the user, including
* the **filename**;
* the **axis scale type** for $x$ and $y$;
* the **whisker length**, chosen from multiple options;
* the **titles** for each axis and the graph.

Box plots plot one series of data as a median, upper quartile and lower quartile, with whiskers that typically show the maximum and minimum of the dataset. As such, they are one dimensional; this means, unlike with cartesian graphs, the data file can be set out so that each column of data will be represented by a box. Below is an example of how to split the data into columns:

$$\mathbf{Box 1\kern 13.8em Box 2\kern 13.8em Box 3}$$
$$dataset \space 1 \kern 6em | \kern 6em dataset \space 2 \kern 6em | \kern 6em dataset \space 3$$

Before the program proceeds with plotting the graph, it will print arrays of the unpacked tables to test whether it is the data you desire. This will happen in a separate cell labelled **Print data to check**; if the data is correct you can run the following cells to plot the graph. It _will not print all of the data_ if it is in more than 5 series in order to save space in the console and keep the code running smoothly; it will instead print the first two and last two series.

In [None]:
### SEARCH FOR FILE BASED ON USER INPUT ###

#Decode csv for and define variable to unpack later
encoded = True                 #Loop for checking if filename is correct
while encoded == True:
    csv = str(input("What is the name of the data file? It must be a .csv file. Do not include the .csv extension, or 'quotes': "))
    data = "{0}.csv".format(csv)
    try:
        datatable = open(data, encoding="UTF-8-sig")   #Try opening the datafile as variable
        encoded = False                                #If successful break loop
    except FileNotFoundError:                          #If unsuccessful ask user to retype file name
        print("Sorry, the file could not be found. Check your input.")
        continue

firstline = datatable.readlines(1)   #Unpack first line of dat

#Loop to count each instance of comma delimiter, add 1 series to count for each 4 instances of delimiter
for i in firstline:
    count = 1
    while len(i) != 0:
        loc = i.find(",")   #Cycle through and count each instance of delimiter
        if loc == -1:
            break           #Break from loop if delimiter not present
        count += 1
        i = i[loc+1:]       #Slice previous delimiter from string, re-iterate
        continue
    seriesno = int(count)

#Loop to find first row with floats, which is starting row of data
line = 1
for i in datatable.readlines():
    comma = i.find(",")
    i = i[:comma]                 #Slice to first item in line
    try:
        float(i)                  #Attempt to make float, if not, row is not data
        break
    except ValueError:
        line += 1

datatable = open(data, encoding="UTF-8-sig")   #Open the datafile again as reading lines clears variable

#Unpacking data and creating variables for plotting the graph
var = [i for i in range(4*seriesno)]
var[:] = np.loadtxt(datatable, delimiter=",", skiprows = line, unpack=True, encoding="UTF-8")

## Print data to check

The cell below will give you the option to print the unpacked data to check whether the operation has worked successfully. It is not required that you run the cell, it is entirely optional. Reasons for not checking include very large datasets, or no way of checking against the csv file.

As printing 200 series of data is a waste of time and space, the cell will print four series only. If the dataset is four series or less, it will print all series, but if it is five or more, it will print **the first two series and the last two series only**. It is useful to check against smaller datasets, to make sure no points have been missed by the functions above.

If your data is not being printed out correctly, check that the csv file fits all specifications for the unpacking in the text cell at the top.

In [None]:
#Print dataset for user to check it is correct
if seriesno < 5:
    for n in range(0,seriesno):            #Only print full set if 4 series or fewer
        print(">>> Box {0}:".format(n+1))
        print(var[n])

else:
    for n in range(0,2):
        print(">>> Box {0}:".format(n+1))   #If more than 4 series, print first two and last two series
        print(var[n])  
        
    print("\n.......\n")  
    
    for n in range(seriesno-2, seriesno):
        print(">>> Box {0}:".format(n+1))
        print(var[n]) 

## Labels and gridlines

The graph and its axes will need labels; the cell below offers inputs for you to add them. It also offers an option to include gridlines on the graph, or have a blank background otherwise.

In [None]:
#Title, x axis and y axis
xlabel = str(input("What would you like to label the x axis?: "))
ylabel = str(input("What would you like to label the y axis?: "))
title = str(input("What would you like to title the graph?: "))

#Radio button widget choices for gridlines
gridradio = wdg.RadioButtons(
    options=["Yes","No"],           #Options for gridlines
    description="Gridlines?",
    disabled=False
)

gridradio        #Call the selection widget

## Scale types

For certain types of graphs, it can be advantageous to plot a logarithmic scale or other types of scale, which are available within `matplotlib`. These are presented in radio button choices below; they are declared as variables later on in the script for plotting the graph.

In [None]:
#Scale types for each axis
print("There are multiple types of scale available for axes in matplotlib. These include log, linear, symlog and logit.")
print("To learn more about scale types, reading is available at https://matplotlib.org/gallery/pyplots/pyplot_scales.html#sphx-glr-gallery-pyplots-pyplot-scales-py")

#Radio button widget choices for x axis
xaxisradio = wdg.RadioButtons(
    options=["linear","log","symlog","logit"],   #Options for scale type
    value="linear",                              #Default selected scale type
    description="$x$ Axis Scale:",
    disabled=False
)

xaxisradio        #Call the selection widget

In [None]:
#Radio button widget choices for y axis
xscale = xaxisradio.value
yaxisradio = wdg.RadioButtons(
    options=["linear","log","symlog","logit"],    #Options for scale type
    value="linear",                               #Default selected scale type
    description="$y$ Axis Scale:",
    disabled=False
)

yaxisradio        #Call the selection widget

## Whiskers

The whiskers on box plots are typically used to represent the maximum and minimum of the dataset, giving a five-number summary: upper-quartile, mean, lower-quartile, maximum and minimum. However, there are other options. Sometimes, the whiskers can reach between the 9th and 91st percentile of the dataset. This program allows you to define which percentiles should give the range using an integer slider below; for example, setting it to "9-91" will give a boxplot with whiskers reaching from the 9th to 91st percentiles. To set the whiskers to the maximum and minimum of the dataset, simply set the range as 0-100.

In [None]:
# Radio button widget choices for whisker length
yscale = yaxisradio.value
whisker = wdg.IntRangeSlider(
    value = [0,100],
    min=0,
    max=100,
    step=1,
    description='Whiskers:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d',
)

whisker

## Plotting

The cell below will plot the graph. Outliers and data not encompassed in the whiskers are plotted as points.

In [None]:
# Plot data
plt.figure()
plt.boxplot(var, whis=whisker.value)
plt.xlabel(xlabel)      #x axis label
plt.ylabel(ylabel)      #y axis label
plt.xscale(xscale)      #x scale type
plt.yscale(yscale)      #y scale type
plt.title(title)        #Title