# Reading in files 

So you have done an experiment and have a file full of data, now we need to read or import that data into python to do stuff with it.

In [None]:
#First we import a few packages we will need 
import numpy as np               # for dealing with math
import pandas as pd              # for tables
import matplotlib.pyplot as plt  # for plotting
%matplotlib inline

In [None]:
dataTable = pd.read_csv('ExpData.txt')

dataTable # printing the table

In [None]:
dataTable = pd.read_csv('ExpData.txt',sep='\t') # Adding a separation 

dataTable # printing the table

In [None]:
dataTable = pd.read_csv('ExpData.txt',sep='\t',header=1) # Setting what row to read as our heading
dataTable.columns = ["t", "v", "v1"] # renaming columns headings

dataTable # printing the table

In [None]:
# If I don't want one of the columns, I can just resave the table excluding the column I don't care about.
dataTable = dataTable[["t", "v"]]
dataTable

In [None]:
# A quick and dirty plot of our data
plt.plot(dataTable.t,dataTable.v)

In [None]:
#another look using dots instead of lines
plt.plot(dataTable.t,dataTable.v,'.')

It looks like our data needs some cleaning up, for every time value there is not a unique voltage.
So lets try fix this, I google search "pandas merge values if column is same", google does't care about proper grammer.

This turns up this page:
https://stackoverflow.com/questions/46826773/how-can-i-merge-rows-by-same-value-in-a-column-in-pandas-with-aggregation-func

Which brings to our attention the aggregate and groupby functions. From here we could look them up in the pandas documentaion but I will insted try my luck using the example as a base.

In [None]:
#aggregation_functions = {'price': 'sum', 'amount': 'sum', 'name': 'first'}
#df_new = df.groupby(df['id']).aggregate(aggregation_functions)

fixedTable = dataTable.groupby(dataTable['t']).aggregate({'v':'mean'})
fixedTable

In [None]:
fixedTable = dataTable.groupby(dataTable['t'],as_index=False).aggregate({'v':'mean'}) # We don't want to use t as an index
fixedTable

In [None]:
#Getting rid of the zero offset
print(fixedTable.v.mean())
fixedTable.v = fixedTable.v - fixedTable.v.mean()
fixedTable

In [None]:
plt.plot(fixedTable.t,fixedTable.v,'.')

# Fitting to the data

Lets say we wanted to get the frequency of this 
We are expecting a Sine or Cosine so we google: "python fit sine to data"
And get https://scipy-lectures.org/intro/scipy/auto_examples/plot_curve_fit.html
Which just so happens to have a perfect example

In [None]:
#set our data to the copy pasted labels
x_data = fixedTable.t
y_data = fixedTable.v

# everything below here is copy pasted
from scipy import optimize
def test_func(x, a, b):
    return a * np.sin(b * x)

params, params_covariance = optimize.curve_fit(test_func, x_data, y_data, p0=[0.1, 3]) #tweaked the initial guess (p0) values

print(params)


plt.figure(figsize=(6, 4))
plt.scatter(x_data, y_data, label='Data')
plt.plot(x_data, test_func(x_data, params[0], params[1]), label='Fitted function')

plt.legend(loc='best')

plt.show()

In [None]:
# We cant see anything so let's modify the plot

plt.figure(figsize=(15, 4)) #increase horizontal width
plt.scatter(x_data, y_data, label='Data')

plt.plot(x_data, test_func(x_data, params[0], params[1]), "k", label='Fitted function') #make the line black

#plt.legend(loc='best')

plt.show()

The fit does't seem to be working.

There could be a few reasons:
- The data is too noisy
- The initial guesses are too far from the true value
- The function we are fitting with is not right


Turns out the issue is fitting a periodic function to a not neatly wraping data causes issues. So we will trim our data to see if that works.

In [None]:
pd.options.display.max_rows = 500 # lets us see more (500) pandas rows
fixedTable[:500]

In [None]:
fixedTable[abs(fixedTable.v) < 0.01] 
# this gives us a list of all data points with absolute value less than 0.01 which is close enough to zero for our purposes

In [None]:
trimmedTable = fixedTable[76:370] # we pick some zeros to start and end out data at

plt.figure(figsize=(15, 4))
plt.plot(trimmedTable.t,trimmedTable.v,'.') # plot our data to see if it worked

In [None]:
x_data = trimmedTable.t
y_data = trimmedTable.v

from scipy import optimize
def test_func(x, a, b , c):
    return a * np.cos(b * x + c) #added one more parameter to account for the x shift

params, params_covariance = optimize.curve_fit(test_func, x_data, y_data, p0=[max(trimmedTable.v), 3, 1])
# give it some decent starting parameters

print(params)

plt.figure(figsize=(15, 4)) #increase horizontal width
plt.scatter(x_data, y_data, label='Data')

plt.plot(x_data, test_func(x_data, params[0], params[1], params[2]), "k", label='Fitted function')

plt.show()

If you are happy with all this then try load the CurrentVoltage file and getting a line of best fit for it.

If have done that as well and still want to learn more about curve fiting  have a lock at mcba12-fitting-a-straight-line from michael's section.