# Using Bagit

This notebook is set up with blanks for a teaching demo. If you want to see the 
entire notebook, with examples of how each cell would or should look when it has run,
see [`01a-using-bagit-completed.ipynb`](01a-using-bagit-completed.ipynb).

## Learning Objectives

After completing this lesson, students should be able to:

* Understand and explain the BagIt specification, and discuss how a BagIt object is structured.
* Use the `bagit` Python module to create a BagIt bag, which includes fixity, manifest, and basic descriptive information.
* Use shell tools (`ls`, `cat`) to conduct initial checking and validation of a BagIt object.
* Identify particular use cases and digital curation activities or problems in which BagIt may be a useful tool.

## Setup

Now let's look into how we can create a BagIt object for some sample files. 
This notebook will demonstrate how 
to do that using a Python module called `bagit`, from files on your computer. If you want to follow this notebook,
the instructions explain the process, step by step, for a folder of sample files in this Git repository. 

If you don't already have the bagit library installed, you may need to get it. You can run the 
following cell to install it with pip, by uncommenting the last line (remove the `#`) and then running the cell.

In [None]:
# If you don't have bagit installed, install following instructions at https://github.com/LibraryOfCongress/bagit-python
# Alternatively, you can use the magic command on the line below by removing the hashtag and running the cell.
# (When the command below runs, you will see response output appear below this cell as the program downloads and installs.)
#!pip install bagit

To begin this activity, set up by importing the library:

In [None]:
# import the library


In [None]:
# to demonstrate automated creation of metadata, also import the date function


## Bag the Files

Steps: 

1. Look at what's there before you start?
1. Create BagInfo metadata
1. Use `bagit` to make the bag

### 1. Look at what's there

We will use the bagit tool to create a valid BagIt object from the directory called `sample-files`. First, take a look at what's in this directory.

- Note: run shell commands from the notebook by putting an exclamation point character at the beginning of the line

In [None]:
# use the list command to see what's there


- You should see five folders and one csv file

### 2. Create BagInfo metadata

Using the Python bagit library, we can create “BagInfo” information by using a Python dictionary. This example creates a dictionary of the bag information called `my_BagInfo`, which will be inserted as an argument during bag creation. If you use this code, replace information below with you the information appropriate to the project you’re working on.

### Bonus: automate date info

The `date` functions (imported earlier) will suffice to create date information. If you run this, the following block should return the current system date from your system. 

In [None]:
# create the dateStamp variable, check the type


Note that the above is a Python datetime object, so for purposes of our BagIt activity, that must be converted to a string:

In [None]:
# create baginfo data

my_BagInfo = {
    'Source-Organization': 'Data Curation Training Pros, via Library of Congress (LC)',
    'Contact-Name' : 'Anonymous', # <- type your name here
    'Contact-Email': 'hello@some.email', # <- type your email here
    'External-Description': 'These are sample files from the Library of Congress Web Archives that we wanted to structure in BagIt for practice.',
    'External-Identifier': 'myfiles:documents/test/files/1234', # <- this would be something like a call number or collection ID, if the content corresponds to a catalog description or digitized item
    'Source-URL': 'https://www.loc.gov/programs/web-archiving/about-this-program/', #this is a reference URL for the collection, in this case doesn't point to each individual file
    'Collected-Date': '2021-10-12',
    'Demonstration-Date': str(dateStamp) #string of date formatted following ISO date standard format YYYY-MM-DD
}

print('Bag Info:\n\n',
      my_BagInfo,
     '\n\nDatatype: ',type(my_BagInfo))

### 3. Use BagIt to make the bag: make_bag()

The bagit module includes a function called `make_bag()` to create BagIt objects from a specific path or directory. We will set up the function by providing as arguments the location of the files that we want to bag (`sample-files`), with the `bag_info` option to create unique descriptive information using the `my_BagInfo` dictionary:

In [None]:
# create the bag; note that the tool does not give feedback, so use a try/except 
# to create the effect of giving a response message



If the cell runs and you don't see the error message, this created a bag,
which is accessible as a python object in the `my_bag` variable. 
(More about this later!)
But before we move on, think about the structure of the BagIt object we previously discussed. 
If you created a bag out of the `sample-files` directory, how do you think it has changed? 

- What files would you expect to see in the directory now?
- What additional folder or directory might you expect to see?
- Where would you expect to find the files that were bagged?

Now, take a look at the `sample-files` directory. If the above cell ran correctly and did not return any errors, you should see changes in the `sample-files` directory. 

In [None]:
# display the contents of sample-files directory


- What changes do you see? 

### What's in the Bag?

To get an idea if this is a complete bag, you can explore the BagIt object and its data using shell commands: 

* Use the shell list command (`ls`) to see if the required bagit structure and files have been created
* Use the `cat` command to display the contents of a file
* Use the `wc` command to count bytes, words, or lines of a file

_Hint: remember that you can use the `!` at the beginning of a line to run a shell program within the notebook._

In [None]:
# check to see, is this bagit? Display the contents of the sample-files directory:


In [None]:
# check to see, is this bagit? First test is whether or not there's a bagit declaraction. do you see bagit.txt?


In [None]:
# is this bagit? are there bag tags, specified in the bag-info.txt file? do they appear to be valid key:value combinations?


- Is this the same information that you put in the bag info dictionary?
- What information is here that you wasn't in the `my_baginfo` dictionary?

You can also read the file contents of the `sample-files/manifest-sha256.txt`:

In [None]:
# is this bagit? is there a manifest that lists checksums and files? how many lines?


In [None]:
# check to see, is this bagit? Is there a data directory? (aka "payload" in the BagIt docs)


- the `data` directory should include the contents of the directory, which was previously named `sample-files`

- for further description of methods for python bagit objects, see the module documentation at https://github.com/LibraryOfCongress/bagit-python  

A more extensive lesson on this topic would include further explanation of tools
within `bagit` that a digital curator may use to check bags, how to research
errors that may occur, and how to update bag manifests when content is changed.

## Conclusion

The above activity demonstrates the steps to create fixity information, file manifests, and associated descriptive information - **basic preservation metadata** - for a group of files. Using an agreed-upon file packaging specification, like BagIt, allows digital curators 
to create information packages that contain basic information about the contents, and can 
help organizations exchanging content to ensure that the content that was sent was the content that was received.
Moreover, keeping this information together can allow a repository, its maintainers, and its users, to 
be able to have some assurance that information received now is the same as that originally received.

## Resources

See these additional resources for more detailed information:
* B. Lazorchak, ["From There to Here, from Here to There, Digital Content is Everywhere!"](https://blogs.loc.gov/thesignal/2012/01/from-there-to-here-from-here-to-there-digital-content-is-everywhere/), _The Signal_ (3 January 2012).
* State Archives of North Carolina, "[Bagger GUI User Guide](https://files.nc.gov/dncr-archives/documents/files/using_bagger.pdf)" (Updated 2012, v. 1.5), available as of March 2018.
* M. Phillips, ["What do we put in our BagIt bag-info.txt files?"](https://vphill.com/journal/post/4142/) (2015).
* UNT Libraries, UNT OAIS Information Package Specification (2015), https://www.library.unt.edu/sites/default/files/documents/digital-libraries-uploads/Appendix_M_UNT_Libraries_OAIS_Information_Package_Specification.pdf