# Getting Started

Hello there. Welcome to Introduction to Text Analytics. For many of you, there is a fair amount of infrastructure we need to build in order for us to get work done. The good news is:

- Once we are done, you will not have to deal with much of it again. A lot of things will simply become *routine* for you.
- We are working the way people out in the world work: using common, open source tools like Python and Git and using the leading platform for Git, GitHub, which is now owned by Microsoft and thus there is plenty of integration with Office. (You can lament the fact that Office is the mainstream productivity suite, but it is what it is.)

We need to make sure you have:

1. a GitHub account
2. GitHub desktop installed
3. your GitHub repos located appropriately on your machine

In addition, you need to have:

1. **mini conda** installed
2. an environment set up that uses Python 3.11
3. the ability to set that as your working environment

This last bit may require some editing of bash/Z-shell (macOS) or command shell (Windows) config files.

## Setting Up Our Environment

One of the first things you are going to do with your conda environment dedicated to this class is install Jupyter Lab and the NLTK. It used to be you had to install a lot of other libraries/modules first, but the genius of **conda** is that it handles all that for us: we tell it what we want to install and it tells us what also needs to be install. We type **Y** to make it so, and conda sets about doing all the work.

So, first, let's create an environment in which we will work: this will allow us not to mess with the base installation of conda, so if anything goes wrong, and things will go wrong, we don't have to start completely over. We will only have to create another environment (feeling free to delete the old environment if you like). 

First, a quick note: this is a Jupyter Lab notebook (from here on simply *notebook*), which displays readily as a web page on GitHub and is also downloadable by you for you to interact with offline. If you click to the left of a paragraph, you will see the cell within which it is located. 

There are two kinds of cells in a notebook, text cells like this one which can, and should be, formatted using [markdown](https://www.markdownguide.org/cheat-sheet/) and code cells like the one below, which allow you to write and run code.

*If you are looking at this notebook on your own machine, you can double-click any cell to enter edit mode. (There are also a lot of keyboard shortcuts available: look under the notebook's help tab for more.)*

In [1]:
# And this is a code cell.
# Anything you don't want to run needs to be a comment.
# Comments in Python begin with a hash character
# Longer ones should sit on lines by themselves
# (and be no longer than 80 characters)

# Short comments can go at the end of a line of code:
mdg = open('../data/mdg.txt', 'r').read() # Open and read a file

type(mdg) # Find out what kind of object it is

str

If you run the file above, you will get the same output as I did.

While we're not ready to run code just yet, you now see how easy it is to mix code and text in a notebook.

## Setting Up an Environment

Once you've installed [mini conda](https://docs.conda.io/projects/conda/en/latest/index.html), setting up an environment couldn't be easier. From the command line, do this:

```bash
conda create --name 370 python=3.10
```

The way to read this line is as follows:

- **`conda`** tells the shell to run the conda program
- **`create`** tells the conda program that we are creating a virtual environment
- **`--name`** tells conda the name of the environment
- **`370`** is the name we are going to use (You can choose any name you like--I use "370" because it's the course number, that's all.)
- **`python=3.10`** tells conda what version of Python we want installed in this environment.

We now have two installations of Python within conda, in addition to any version of Python installed in the system itself. 

**Why so many versions of Python?**

First, we never want to mess with the system version of Python. Nope. Nah ah. Never. We don't know what else within the system depends on that, so working with that installation risks messing with the operating system itself. No thank you.

Second, we want to keep the `base` version of conda as is: it is after all how we manage everything else, and since none of these installations take up much room -- usually less than a 1GB -- it's okay to have multiple versions. 

**Why Python 3.10 when the newest version of 3.13?**

First, the cutting edge is the cutting edge: it will cut you. 

Second, it has been my experience that a number of the libraries with which we will be working, especially toward the ends of the semester, are not yet updated to be compatible with the cutting edge. For now, **3.10** seems like a happy middle ground. (This could change during the semester, so we may be creating a new environment to keep up with the ever-shifting software landscape.)

You can get a list of your environment witht the following command:

```bash
(base) ~ % conda env list
```

Conda returns the following on my machine:

```bash
# conda environments:
#
base                  *  /Users/jl/miniconda3
310                      /Users/jl/miniconda3/envs/310
311                      /Users/jl/miniconda3/envs/311
370                      /Users/jl/miniconda3/envs/370
DEV                      /Users/jl/miniconda3/envs/DEV
TED                      /Users/jl/miniconda3/envs/TED
```

You can see from that listing that I keep a couple of environments around for testing functionality with versions of Python, one for this class, and then two more for particular projects.

## Activate Your Environment

To work within a particular environment, you simply need to enter the following command:

```bash
conda activate 370
```

That's it. You will now see a different preface before your prompt:

```
(370) ~ %
```

If you had not noticed the `(base)` preference before, note it now. (The **`%`** sign you see above, and you may occasionally see in other notebooks when I am refering to the command line is simply my preferred cursor. Other systems may use a dollar sign, `$`, or some other symbol. I am old enough that the standard Unix command line prompt was the percent sign, `%`, and that distinguished it from the VAX command line, which was the dollar sign. (Yes, I'm that old.)

## Editing Your Configuration File

As it is now, every time you log into your computer and want to work on materials for this class you will, after you open your terminal or command shell, need to activate the 370 environment. That's not terrible, given how simple it is, but you can also set it up so that that gets done for you.

## Getting Help

This class does not forbid the use of ChatGPT or any other of the LLMs (large language models). You are free to use them, but you should know that the answers to more complex problems are often useless. That is, an LLM is only as good as the code it has ingested, and there is a lot of bad code out there. 

The other problem with LLMs is that if they do in fact give you a useful answer, you don't necessarily understand how they arrived at that particular code. Again, this is going to impede you being able to write more complex kinds of code later in the course. 

What are your alternatives?

First, let's begin with something that Americans hate, *reading the directions*. Almost all of the software we will be using this semester comes with *documentation*:

- [Python](https://docs.python.org/3.10/) has it. (Link is to 3.10 version.)
- [Conda](https://docs.conda.io/projects/conda/en/latest/index.html) has it.
- [GitHub](https://docs.github.com/en) has it.

Almost every single one of the Python libraries, like the NLTK, also has documentation. 

In some cases, the documentation isn't helpful because you don't know what question you are trying to answer. Web searches will often turn up the term you need to use. And, when in doubt, I often turn to [StackOverflow][]. Be forewarned: search SO before asking a question, and if you ask a question, make sure you show what you have tried. The denizens of SO are very helpful, but they do look askance at people who have not yet tried to help themselves. 

[StackOverflow]: https://stackoverflow.com