# A set of diamonds

The data set ```diamonds.txt``` contains data on the characteristics of diamonds and their prices.  There are 5 levels of clarity in the data.  From clearest to least clear, they are IF, VVS1, VVS2, VS1, and VS2.

* IF = Internally Flawless
* VVS = Very, very slight inclusions
* VS = Very slight inclusions

We want to write a function to count the number of diamonds of each type.  Later in the course, we'll discuss how to use libraries for this type of analysis in Python.  For now, we'll ignore much of the structure of diamonds.txt , and just treat it as a text file without any other organization.

Of course, you'll need to have ```diamonds.txt``` in the same directory as this notebook to run the demo.

In [None]:
# Read the data in diamonds.txt as a string

with open("diamonds.txt") as diamond_file:
    diamond_string = diamond_file.read()

We follow the [```numpy``` standard for documentation strings](https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#docstring-standard).

In [None]:
def clarity_count(s, clarity):
    
    """
    Return the number of instances of the indicated diamond clarity in the given string.
    
    Parameters
    ----------
    s: str
    A string containing the description of some diamonds.
    
    clarity: int
    An integer between 0 and 4 (inclusive)
    0 indicates 'IF', 1 is 'VVS1', 2 is 'VVS2', 3 is 'VS1', and 4 is 'VS2'
    
    Returns
    -------
    clar_count: int
    Number of instances of diamond clarity.
    """
    
    clar_string_list = ['IF', 'VVS1', 'VVS2', 'VS1', 'VS2']
    clar_count = s.count(clar_string_list[clarity])
    return clar_count

In [None]:
# We can use the ? to display the docstring for the function.

clarity_count?

In [None]:
# We test our function on a short string

clarity_count("IF VVS1 IF", 0)

In [None]:
# How often does "IF" appear in diamonds.txt?

clarity_count(diamond_string, 0)

In [None]:
# How often does each clarity level appear in diamonds.txt?

for i in range(5):
    print(clarity_count(diamond_string, i))

Let's improve our ```clarity_count()``` function in two ways: we'll add a default clarity of 0, and we'll display an error message if the specified clarity number is not an integer between 0 and 4.

Note that Python incorporates sophisticated error-checking functionality.  The method we use here is appropriate for informal data analysis, but not for production code.

In [None]:
def improved_clarity_count(s, clarity=0):
    
    """
    Return the number of instances of the indicated diamond clarity in the given string.
    
    Parameters
    ----------
    s: str
    A string containing the description of some diamonds.
    
    clarity: int
    An integer between 0 and 4 (inclusive).  Default value is 0.
    0 indicates 'IF', 1 is 'VVS1', 2 is 'VVS2', 3 is 'VS1', and 4 is 'VS2'
    
    Returns
    -------
    clar_count: int
    Number of instances of diamond clarity.
    """
    
    if type(clarity)!= int:
        print("Warning: clarity must be an integer.")
        return
    
    if clarity<0 or clarity>4:
        print("Warning: clarity must be between 0 and 4, inclusive")
        return
    
    clar_string_list = ['IF', 'VVS1', 'VVS2', 'VS1', 'VS2']
    clar_count = s.count(clar_string_list[clarity])
    return clar_count

In [None]:
improved_clarity_count("IF IF IF")

In [None]:
improved_clarity_count("IF IF IF", 7)

In [None]:
improved_clarity_count("IF IF IF", "I like cats")

In [None]:
n = improved_clarity_count("IF IF IF", "I like cats")
print(n)

# Lambda Functions

Python allows us to create "lambda functions".  These are anonymous functions, where we don't store the function name.  Lambda functions can be useful if we want to apply a short function to all elements of a list or other data structure.

To demonstrate lambda functions, we need to create a list to work with.

In Python, it's easy to iterate over the lines of a file.  Doing so makes it easier to plan for working with large files.

In [None]:
# Make a list of the items in the diamonds.txt file
diamond_list = []
with open("diamonds.txt") as diamond_file:
    for line in diamond_file:
        diamond_list.append(line.split())

# display the first few lines of the resulting list
diamond_list[0:10]

In [None]:
# drop the "title" portion of the list
diamond_list.pop(0)
diamond_list[0:10]

In [None]:
# Use a list comprehension to create a list of the carats in each diamond.
# Note that the eval command converts a string to a number.

carat_list = [eval(d[0]) for d in diamond_list]
print(carat_list)

The points $p$ of a diamond are related to its carats $c$ by $p = 100c$.

We use a lambda function and the ```map``` command to convert our list of carats to a list of points.

In [None]:
point_list = list(map(lambda c: 100*c, carat_list))
print(point_list)

We could have created the same list using a list comprehension.

In [None]:
print([100*c for c in carat_list])

One difference between the lambda function and the list comprehension is that using ```map``` with the lambda function returned an iterator (which we then converted to a list using the ```list``` command), while a list comprehension directly returns a list.  When we make a list, we have to store the whole thing in memory, which may be problematic if we have a lot of data.

In [None]:
# Let's extract the diamonds with less than 100 points in two ways.
# First we use the filter command.
# Note that our lambda function here returns a boolean value, True or False

print(list(filter(lambda x: x<100, point_list)))

In [None]:
# Next we use a list comprehension.
print([x for x in point_list if x<100])