# Click "Edit App" to see the code
# Averaging a subset of data

In this notebook we'll demonstrate how to compute the average of a chunk of data from a large dataset.
We can start from loading the Python packages

# The Jupyter Notebook
First of all we import the Python packages

In [None]:
# python packages
import pandas as pd # DataFrames and reading CSV files
import numpy as np # Numerical libraries
import matplotlib.pyplot as plt # Plotting library
from lmfit import Model # Least squares fitting library

We then read a data file into a DataFrame, and rename the columns

In [None]:
data = pd.read_csv("../miscData/random1.csv")
data.columns = ("X","Y")
print(data)

The most common scenario is to compute the average of a chunk of data, discarding the initial and/or final part of the data set. We can therefore define two variables; the index of the first point to be included in the average and the total number of points to be averaged. Alternatively one could set the index of the last point to be included in the average
Remember that Python starts counting from zero

In [None]:
# Total number of points
totalNumberOfValues = len(data["Y"]) 
# First element to be included in the average
firstValue = 0
# Number of elements to be included in the average
numberOfValuesToAverage = 3 
# Last element to be included in the average
lastValue = firstValue + numberOfValuesToAverage - 1
print("Total number of points in the DataFrame        :",totalNumberOfValues)
print("First element to be included in the average    :",firstValue)
print("Last element to be included in the average     :",lastValue)
print("Number of values to be included in the average :",
      numberOfValuesToAverage)

Let's print the values in the second column that corresponds to the interval we have chosen.
We can also to check they are what we expect.

In [None]:
values = data.iloc[firstValue:lastValue+1]["Y"].values
print(values)

* Note how in the cell above we used a different syntax for selecting the elements of the data frame, **iloc[:]["Y"]**. That is equivalent to the following code.
* Also note how we used **.values** to convert the DataFrame to an array

In [None]:
v0 = data["Y"].values
v1 = v0[firstValue:lastValue+1]
print(v1)

We can now compute the average of the numbers in the array using the **mean** function in NumPy.

In [None]:
average = np.mean(values)
print("Average :",average)

For some types of statistical analysis, like bootstrapping, we might be interested in randomly selecting a subset of data, to reduce the human bias in the analysis. In order to do this we can use the **ramdom.choice()** function in NumPy to create an array of random numbers taken between 0 and the size of out sample (_numberOfValues_).
This array will contain the indices of the elements that we'll pick from our global array.

In [None]:
numberOfValues = 20 
randomIndices = np.random.choice(totalNumberOfValues, 
                                 replace=False, 
                                 size=numberOfValues)
print(randomIndices)

We can then use that array of number to create a array with the data that we are going to average.

In [None]:
randomValues = data.iloc[randomIndices]["Y"].values
print(randomValues)

We can then compute the average of _randomValues_

In [None]:
averageOfRandomValues = np.mean(randomValues)
print("Number of randomly selected values      :",numberOfValues)
print("Average of the randomly selected values :",averageOfRandomValues)