# Looping Over Data Sets

## Use a `for` loop to process files given a list of their names.

*   A filename is just a character string.
*   And lists can contain character strings.



## Use `glob.glob` to find sets of files whose names match a pattern.

*   In Unix, the term "globbing" means "matching a set of files with a pattern".
*   The most common patterns are:
*   `*` meaning "match zero or more characters"
*   `?` meaning "match exactly one character"
*   Python contains the `glob` library to provide pattern matching functionality
*   The `glob` library contains a function also called `glob` to match file patterns
*   E.g., `glob.glob('*.txt')` matches all files in the current directory 
whose names end with `.txt`.
*   Result is a (possibly empty) list of character strings.



## Use `glob` and `for` to process batches of files.

*   Helps a lot if the files are named and stored systematically and consistently
so that simple patterns will find the right data.

*   This includes all data, as well as per-region data.
*   Use a more specific pattern in the exercises to exclude the whole data set.
*   But note that the minimum of the entire data set is also the minimum of one of the data sets,
which is a nice check on correctness.

# Challenges

## Determining Matches

Which of these files is *not* matched by the expression `glob.glob('data/*as*.csv')`?

1. `data/gapminder_gdp_africa.csv`
2. `data/gapminder_gdp_americas.csv`
3. `data/gapminder_gdp_asia.csv`
4. 1 and 2 are not matched.

## Minimum File Size

Modify this program so that it prints the number of records in
the file that has the fewest records.

~~~python
import glob
import pandas
fewest = ____
for filename in glob.glob('data/*.csv'):
    dataframe = pandas.____(filename)
    fewest = min(____, dataframe.shape[0])
print('smallest file has', fewest, 'records')
~~~

Notice that the shape method returns a tuple with 
the number of rows and columns of the data frame.

## Comparing Data

Write a program that reads in the regional data sets
and plots the average GDP per capita for each region over time
in a single chart.

## Key Points
- Use a `for` loop to process files given a list of their names.
- Use `glob.glob` to find sets of files whose names match a pattern.
- Use `glob` and `for` to process batches of files.