# Importing Large Data Files

Here we're just going to take a look at the ICP-OES data files and make sure we understand the structure of the file and brainstorm a little bit about how we will need to organize this file to make the data analysis process next week a little easier.

In [2]:
# Import all the functions needed
import numpy as np
import scipy.stats as stats
import pandas as pd
import math

# read the whole data file in:
data = pd.read_csv("hannah_analiese_atomic_emission.csv", header = 6)

print(data)

          Label Rack:Tube    Type        Date Time     Element Element Label  \
0         Blank       2:1     BLK  2/27/2023 17:08  Cu 327.395            Cu   
1         Blank       2:1     BLK  2/27/2023 17:08  Fe 238.204            Fe   
2         Blank       2:1     BLK  2/27/2023 17:08  Pb 217.000            Pb   
3    Standard 1       2:2     STD  2/27/2023 17:10  Cu 327.395            Cu   
4    Standard 1       2:2     STD  2/27/2023 17:10  Fe 238.204            Fe   
5    Standard 1       2:2     STD  2/27/2023 17:10  Pb 217.000            Pb   
6    Standard 2       2:3     STD  2/27/2023 17:12  Cu 327.395            Cu   
7    Standard 2       2:3     STD  2/27/2023 17:12  Fe 238.204            Fe   
8    Standard 2       2:3     STD  2/27/2023 17:12  Pb 217.000            Pb   
9    Standard 3       2:4     STD  2/27/2023 17:13  Cu 327.395            Cu   
10   Standard 3       2:4     STD  2/27/2023 17:13  Fe 238.204            Fe   
11   Standard 3       2:4     STD  2/27/

## Step 1 - Inspect the data file

Look at the data that has been printed above - make sure the file is reading correctly and that this looks like your data. You can compare it to the csv file opened in Excel.

Answer the following questions:
1. What are the most important columns to pay attention to when trying to read this data? Look at the headings and prioritize the top 2 or 3 columns that have the most critical information in them.


2. How is the data organized in this file? Where are the different standards and samples stored, and where are the different elements stored? 



## Step 2

Using Python to organize the data

In [3]:
#The pandas package in python is valuable for working with large spreadsheets, because it allows us to call data by the heading of the column!
# Try it here - we're going to read all of the unique names under the heading 

# read all the different sample labels:
#print(F"all of the data in the column titled Label is {data.Label}")
      
unique_label = data.Label.unique()

n_label = len(unique_label)

print(F"There are {n_label} unique different sample labels in this data set")

print(F"Each unique label is listed here: {unique_label}")




There are 14 unique different sample labels in this data set
Each unique label is listed here: ['Blank' 'Standard 1' 'Standard 2' 'Standard 3' 'Standard 4' 'Standard 5'
 'Standard 6' 'Standard 7' 'Standard 8' 'Standard 9' 'Standard 10'
 'Sample 1' 'Sample 2' 'Sample 3']


In [3]:
# Based on the example code you can see above, make a new list, called "Elements" which has every unique element wavelength in it.


#print the list and make sure it has all of our elements in it, and they match the expected wavelengths!




In [4]:
# This turns out to be really useful, because we can also ask Python to find certain values or words in a given list

# for example, let's have it give us all of the Intensity values for rows labeled Blank

for x in range (len(data.Label)):
    if data.Label[x] == "Blank":
        print(data.Intensity[x])

262.77
34.79
3.41


In [None]:
# repeat the process above, but this time return all values that are standards



## Step 3

Brainstorm a strategy for processing this data

1. Write out in words in the block below how you will use the data we collected to determine the concentration of the unknown samples

2. Based on what you have learned in previous classes, how might you try to assess whether or not our data is accurate or precise? Are there qualitative observations that you might make? What would you be looking for? Do you know of any numbers that might be helpful in assessing the quality of this data? (You should know at least one, that we've seen before - but if you have encountered others, mention them here!
