# Gettings Started

G. Richards
(2016, 2018, 2020, 2022, 2024)
This notebook contains most everything that we need to get started.  It draws heavily from classes taught by Zeljko Ivezic and Mario Juric, Andy Connolly (http://connolly.github.io/introAstroML/) and Karen Leighly ([seminar.ouml.org](http://seminar.ouml.org)).  With updates to my own class from [Stephen Taylor's class at Vanderbilt](https://github.com/VanderbiltAstronomy/astr_8070_s24).

This notebook is specifically for those with **Apple** laptops.  There will be a slightly different version for those using Windows.

## Contents

* 1. [Terminal - iTerm2](#one)
* 2. [Terminal Framework - Oh-My-Zsh](#two)
* 3. [Package Manager - Homebrew](#three) 
* 4. [Version Control - Git](#four)
* 5. [Python Package Installer - Mamba](#five)
* 6. [Integrated Development Environment - JupyterLab](#six)
* 7. [Data Science Software - TensorFlow + PyTorch](#seven)
* 8. [Review of Python, Numpy, Scipy, Matplotlib, and Pandas](#eight)
* 9. [Practice with Git](#nine)
* 10. [Class Participation](#ten)

## 1. Terminal - iTerm2  <a class="anchor" id="one"></a>

### What is a Terminal?

A terminal, also known as a command line interface (CLI), is a text-based interface used to interact with your computer. Unlike graphical user interfaces (GUIs) that use windows, buttons, and icons, a terminal allows users to execute commands by typing them out and receiving text feedback. This interface is powerful for running programs, managing system tasks, and accessing a direct line to the operating system's underlying functionalities.

### Why iTerm2?

[iTerm2](https://iterm2.com/) is a replacement for the default Terminal app on macOS. It supports macOS 10.14 and newer.

#### Key Features of iTerm2

- **Split Panes**: You can customize your workspace by splitting a terminal window into multiple panes, both vertically and horizontally. 

- **Hotkey Window**: Set up a hotkey to quickly bring iTerm2 to the front, regardless of which application you're using.

- **Search**: iTerm2 offers a search feature with highlighting of matches, including regular expression compatibility.

- **Autocomplete**: iTerm2 can suggest completions for words previously typed in your terminal, accessible with shortcut (`Cmd-;`)

#### Installing iTerm2

To install iTerm2 (again, for Apple users), follow these steps:

1. Go to the [iTerm2 downloads page](https://iterm2.com/downloads.html) .
2. Download the stable release appropriate for your operating system version.


## 2. Terminal Framework - Oh-My-Zsh  <a class="anchor" id="two"></a>

### What is Oh My Zsh?

Oh My Zsh is an open-source, community-driven framework for managing your Zsh (Z shell) configuration. It's designed to be a drop-in replacement for your existing shell configuration, enhancing it with the ability to use hundreds of plugins and themes, which can increase productivity and ease of use.

#### Why Use Oh My Zsh?

- **Plugin System**: Oh My Zsh integrates hundreds of plugins that can aid in daily tasks like Git commands, npm, Docker, theme customization, and more.
- **Themes**: With the various themes available, you can customize the look of your command prompt.
- **Community Driven**: It is open-source and community-driven.

### Installing Oh My Zsh

To install Oh My Zsh, you can use `curl` or `wget`. Here's how to do it:

#### Via Curl
Run the following command in your terminal:

    sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Or

#### Via Wget

If you prefer using wget, use this command instead:

Run the following command in your terminal:

    sh -c "$(wget https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh -O -)"

Both methods will install Oh My Zsh and replace your current shell configuration with a Zsh configuration managed by Oh My Zsh.

## 3. Package Manager - Homebrew <a class="anchor" id="three"></a>

### What is Homebrew?

Homebrew is a free and open-source software package management system that simplifies the installation of software on Apple's macOS operating system and Linux. The tagline "Homebrew installs the stuff you need that Apple didn’t" sums it up.

#### Key Features of Homebrew

- **Simplifies Installation**: Homebrew makes it easy to install packages with a simple command (`brew install <package>`).
- **Isolation**: Homebrew installs packages to their own directory and symlinks their files into `/opt/homebrew` on Apple Silicon.
- **Non-Intrusive**: Homebrew won’t install files outside its prefix, ensuring it does not interfere with the system.
- **Flexibility**: Built on Git and Ruby, it allows advanced users to modify, update, and manage packages easily.

#### Why Use Homebrew?

Homebrew provides an enormous library of software and tools without the need for manual configurations and setups, making it a must-have tool for developers and data scientists working on macOS and Linux platforms.

#### Installing Homebrew

To install Homebrew, execute the following command in your macOS Terminal or Linux shell prompt:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

## 4. Version Control - Git  <a class="anchor" id="four"></a>

### What is Git?

Git is a version control system that helps track changes in computer files and coordinates work on those files among multiple people. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files. Unlike many older tools like SVN or CVS, Git uses a distributed model, providing each developer a full history of all changes.

#### Why Use Git?

Understanding Git is crucial because it facilitates effective version control and collaboration.

- Snapshots: The core idea behind Git's design is to think of data as a series of snapshots of a mini filesystem. While other tools often use delta-based storage, focusing on the differences between files from one version to another, Git stores a snapshot of what all files look like at a given moment. This fundamental difference enables many of Git's most powerful features.

- Efficiency: Git’s snapshot-based model allows for quick branching and merging.

- Flexibility: With its distributed nature, every user has a full backup of the repository, allowing for work in various workflows, offline activity, and collaboration across teams.


### Installing Git on macOS

Before you start using Git, you need to install it on your computer. Here’s how you can install Git on macOS:

#### Using Xcode Command Line Tools

Open the Terminal and type:

    git --version

If Git is not already installed, this command will prompt you to install the Xcode Command Line Tools, which include Git.

#### Using a Binary Installer

If you need a more up-to-date version of Git, you can opt for the binary installer from the Git website. This method ensures you have the latest features and security updates.

Visit the Git website to download the macOS installer: [Git for macOS](https://git-scm.com/download/mac).  Once downloaded, run the installer and follow the on-screen instructions.

### A Place for Git

Once Git is installed, make a place where you are going to put the class repository.  Perhaps in a git subdirectory if you think that you might use more repositories later (we might even for this class!).  
```
$ cd ~
$ mkdir git
$ cd git
```

### Forking the class repository

- If you don't already have a [GitHub account](https://github.com/), please make one.  If you have one already, go ahead and login.

- Now we need to "fork" the class repository into your account using this [example](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo#forking-a-repository). 

- If that works for you, next "fork" the [PHYS 440/540 GitHub Repo](https://github.com/gtrichards/PHYS_440_540).

Now you have a saved and easily accessible copy of the class repository. 

### Cloning the class repository

Next, we need to "clone" (make a local copy of) your forked repository.


First move to the `git` folder that you created above:
```
$ cd git
```

Then exectute this command (filling in the blank as appropriate):

```
$ git clone https://github.com/{YOUR_GITHUB_USERNAME}/PHYS_440_540.git
```    

where `YOUR_GITHUB_USERNAME` is your github username. This will make a subdirectory called `PHYS_440_540` in which you will see a file called `InitialSetupApple.ipynb` (which is the file that you are reading now!).  See below for how to open this (and other notebooks) using [Jupyter](http://jupyter.org/).

Before proceeding further, you also need to set the original class repository as the 'upstream', so that you can get the lastest changes before each class. Now do the following

```
$ cd PHYS_440_550
$ git remote add upstream https://github.com/gtrichards/PHYS_440_540.git
$ git remote set-url --push upstream DISABLE
```

**Before each class**, you will want to update this repository so that you get any new files that we have put there for you. Do that with

```
$ cd ~/git/PHYS_440_540
$ git fetch upstream
$ git checkout master
$ git merge upstream/master
```

### Tips:
1. To avoid unneccesary merging conflicts involving executed jupyter notebooks, try not to excecute/save/commit corresponding class notebooks in the master branch before the actual class. Feel free to do it in a separate [branch](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging).  If you need help with branches, see [Chapter 4 of DataCamp's Introduction to Git lesson](https://app.datacamp.com/learn/courses/introduction-to-git).  
2. If you happened to execute/save any class notebooks ahead of the schedule, you can run the following command can get back to the last commit before syncing your repository with the class repository.  (Move any changes that you want saved to another folder if you don't want to lose your work!)

```
$ git reset --hard
```

## 5. Python Package Installer - Mamba  <a class="anchor" id="five"></a>



### What is Mamba?

Mamba is a re-implementation of the conda package manager in C++, designed to be much faster and more responsive. For users familiar with conda, Mamba offers a drop-in replacement that can handle environment and package management more efficiently. It is particularly advantageous in scenarios where conda might take a long time to solve package dependencies.

#### Why Use Mamba?

Mamba's quick dependency resolution saves time and reduces complexity in maintaining various environments.

Mamba is cross-platform and works on Windows, OS X, Linux (including ARM64 and PPC64LE), ensuring broad compatibility with various system architectures.

#### Mamba vs. Conda

Mamba utilizes the same commands as conda, so existing conda users will find it familiar. However, the implementation differences mean Mamba can offer significant performance improvements, especially when working with complex dependencies.

#### What's the difference between Mambaforge and Miniforge?

Since August 2023, with the release of Miniforge 23.3.1, Mambaforge and Miniforge have essentially become identical. Both distributions now include mamba thanks to the integration of the `conda-libmamba-solver`. The primary difference lies in the name and default installation path.

### Mamba Installation

#### Fresh Install with Miniforge (Recommended)

Miniforge is recommended for new users. It comes with the `conda-forge` channel pre-configured, simplifying package management:

1. Miniforge via Homebrew:

         brew install miniforge
     

OR

2. Download Miniforge:
   - Use the following commands to download and install Miniforge:

         curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
         bash Miniforge3-$(uname)-$(uname -m).sh

#### Usage

- **Unix-like Platforms (Mac OS & Linux)**: If Miniforge is on the system path (which is default on Mac and Linux), you can use the conda and mamba commands from any command prompt. The default channel will be set to `conda-forge`.

- To make sure that it is on your path do:

        conda init zsh


### Python

Installing Mamba through miniforge will also ensure Python is installed in your system

To check your python version:

    python --version

The course requirements were set up up to ensure that you have at least *some* (extended) experience with Python.  E.g., you might have taken the 113-114-115 series.  If you don't know *any* Python at all, this class may be pretty tough going.  However, many of you might need a refresher.  A good place to start is **Appendix A** (more specifically A3) in the [textbook](https://press.princeton.edu/books/hardcover/9780691198309/statistics-data-mining-and-machine-learning-in-astronomy).  

In this course, we will be taking advantage of the very nice tools developed by the folks at [DataCamp](https://www.datacamp.com/).  You will all be given access to DataCamp by virtue of being enrolled in this course.  As a result you will actually have access to their full library for 6 months.  If you need a Python refresher, I suggest that you work through their [Intro to Python for Data Science](https://www.datacamp.com/courses/intro-to-python-for-data-science) and [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science); indeed you'll do one of these for Homework 1.

I would also recommend the [codecademy](https://www.codecademy.com/) course on Python.  It is free and does a good job of walking you through things that you need to learn.

You might also be interested in [A Student’s Guide to Python for Physical Modeling](https://press.princeton.edu/books/ebook/9780691270890/a-students-guide-to-python-for-physical-modeling): Chapters 1-3 provide a good introduction to Python before getting into the "physical modeling" part.  It encourages the use of **spyder**, which is a matlab-like interactive interface for Python (if you'd prefer that sort of interface).

Lastly, from Jake VanderPlas (a contributor to astroML and a co-author of our textbook):

- [A Whirlwind Tour of Python](https://www.oreilly.com/library/view/a-whirlwind-tour/9781492037859/)
- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)

Note that you can access the entire [O'Reilly libarary](http://ezproxy2.library.drexel.edu/login?url=https://www.safaribooksonline.com/library/view/temporary-access/?orpq&email=^u) for free with your Drexel login (not by institution).

## 6. Integrated Development Environment - JupyterLab  <a class="anchor" id="six"></a>

### Introduction to JupyterLab

JupyterLab is an interactive development environment and the next-generation web-based user interface for Project Jupyter. JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. It is particularly well-suited for data science and scientific computing thanks to its support for multiple languages, in-line display of outputs, and robust tools for visualization.

#### Key Features of JupyterLab

- **Web-based Interface**: JupyterLab provides a web-based interface that allows you to combine code, markdown text, visualizations, and rich media in one.
- **Multilingual Support**: It supports multiple programming languages, making it versatile for various types of projects.
- **Collaboration and Sharing**: Built-in features support sharing and collaboration, making it easier to work on projects with peers.
- **In-line Outputs**: Outputs from code (like visualizations) are displayed right in the notebook, alongside the code that generates them.
- **Extensibility**: JupyterLab can be extended with third-party extensions, enhancing its functionality and integrating additional tools.

### Installing JupyterLab

JupyterLab can be installed using Mamba, which provides a faster and more reliable installation process compared to conda. Follow these steps to install JupyterLab:


1. Install JupyterLab using Mamba
         
       mamba install jupyterlab

   
2. Install Jupyter Notebook using Mamba
   
       mamba install notebook

#### Running JupyterLab

Once JupyterLab is installed, you can start it by running the following command in your terminal:

    jupyter lab

This command will start the JupyterLab server and open JupyterLab in your default web browser. You will see an interface where you can create new notebooks, open existing ones, and access other tools like the terminal and text editor.

Each lecture will be in the form of a [jupyter notebook](http://jupyter.org/) that you'll download from the github repository before the start of each class.  

To start the notebook app, run:

     jupyter notebook
    
This will pop up a web page in your default browser and show the Dashboard where you can navigate to a particular notebook or open a new one.  You can also open a notebook directly with (for example):

     jupyter notebook InitialSetup.ipynb &
   
If you are creating a new notebook, click on 'new' at the top-right of the page and select 'python'.

For class, I recommend that you first copy each notebook, appending your name to the file.  That way you aren't editing the original and you'll have a version that you can turn in (for the sake of attendance).  For example,

    cp InitialSetup.ipynb InitialSetup_FirstnameLastname.ipynb
    
And then open that notebook with the command 

    jupyter notebook InitialSetup_FirstnameLastname.ipynb &

### Working with Jupyter Notebooks

Notebooks have 2 primary cell types: 'Markdown' and 'Code'.    The Markdown cells are basically just for you to read.  The Code cells are meant to be executed (perhaps after you have filled in some blanks).  

To execute a cell in the notebook, type '**shift-return**'. If you see a * in the bracket left of the cell or a 'Busy' in the tab title, it means the command is in the process of running, and you need to wait for it to finish.

The notebook is autosaved, so that when you return to a notebook, everything is the same as you left it. If you want to reset it, you can do “revert to checkpoint”.  If you save it yourself, you will create a checkpoint, and the original version will be unavailable.



Here are some useful commands to get you started (# followed by text indicates a comment and not what you type)!

    In [ ]: ?          # basic help function.  Pops open a sub-frame at the bottom of page.  
                         Close with "x".
    
    In [ ]: %quickref  # Pops open a quick reference card
    
    In [ ]: ! 	       # spawning out to the operating system; 
                         e.g files=!ls will fill the variable files 
                         with a list of files in your directory.


Jupyter also contains a number of [“magic” commands](https://ipython.org/ipython-doc/dev/interactive/magics.html).
Two examples are

    In [ ]: %matplotlib inline  # makes plots within the web browser instead of popping up a new window
    
    In [ ]: %whos               # lists the vectors, etc. that you have defined.

Jupyter also has lots of [keyboard shortcuts](https://ipython.org/ipython-doc/1/interactive/notebook.html#keyboard-shortcuts) – the main one is shift-enter to run a cell.

## 7. Deep Learning Software - TensorFlow + PyTorch <a class="anchor" id="seven"></a>

### Introduction to TensorFlow

TensorFlow is an open-source machine learning framework developed by Google. It is widely used in the industry and academia for building and training deep learning models. TensorFlow's architecture allows for easy deployment of computation across a variety of platforms (CPUs, GPUs), enhancing its flexibility and scalability.

#### What Does TensorFlow Do?

TensorFlow allows users to create sophisticated deep learning models that can engage in a variety of tasks such as:
- **Handwritten Digit Classification**: Automatically identifying numbers written by hand.
- **Image Recognition**: Identifying and classifying objects within images.
- **Word Embeddings**: Transforming words into numerical vectors that represent their semantic meanings.
- **Natural Language Processing (NLP)**: Enabling machines to understand and interpret human language.

TensorFlow's computational graph approach allows for intuitive design of complex neural networks, while its execution efficiency on different types of hardware makes it a versatile tool for machine learning.

### Requirements for Installing TensorFlow on Mac

To install TensorFlow on a Mac, particularly one with Apple silicon (M1, M2 chips) or AMD GPUs, ensure your system meets the following requirements:
- **Mac Computers**: Apple silicon or AMD GPUs.
- **macOS Version**: 12.0 or later.
- **Python Version**: 3.8 or later.
- **Xcode Command-Line Tools**: Installed using the command:
  ```bash
  xcode-select --install


### Installing TensorFlow on macOS

TensorFlow can take advantage of Apple's Metal API to significantly accelerate training on Mac GPUs through the tensorflow-metal PluggableDevice. Follow these steps to install TensorFlow and tensorflow-metal:

#### Install Base TensorFlow:
For TensorFlow version 2.13 or later, use Mamba for a faster installation:

    mamba install tensorflow

### Verifying TensorFlow Installation

After installation, it's good practice to verify that TensorFlow is working correctly. You can do this with a simple Python script:

    import tensorflow as tf
    print("TensorFlow version:", tf.__version__)
    print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

This script checks the TensorFlow version and how many GPUs are available to TensorFlow, indicating whether the tensorflow-metal plugin is functioning correctly.

### Introduction to PyTorch

PyTorch is an open-source machine learning framework that powers a range of applications that involve deep learning, such as image recognition and natural language processing. Developed by Facebook AI Research and other contributors, PyTorch is notable for its ease of use, flexibility, and efficient computational graph dynamics. It is written in Python and features a user-friendly front end with a robust, GPU-accelerated backend, making it a preferred tool for researchers and developers for rapid prototyping and development of deep learning models.

### Why Use PyTorch?

- **Dynamic Computation Graphs**: PyTorch allows for modifications to the computation graph on the fly with its reverse-mode auto-differentiation feature. This is especially useful for dynamic model adjustments during runtime.
- **Intuitive Interface**: Offers an intuitive interface that helps in building and training deep learning models with ease, thanks to its straightforward syntax and processing capabilities.
- **Strong GPU Support**: Provides extensive support for GPU acceleration, enabling efficient and fast processing necessary for training complex models involving large datasets.

### Installation Prerequisites

- **macOS Version**: Supported on macOS 10.15 (Catalina) or above.
- **Python**: Python 3.8 to 3.11 is recommended for compatibility. Python can be installed via Anaconda, Homebrew, or directly from the Python website.

#### Note on Python Version Support
In future PyTorch releases, support for Python 3.8 will be deprecated.

### Installing PyTorch on macOS

PyTorch installation can be performed using Anaconda or pip. Anaconda is recommended as it manages dependencies more comprehensively.

1. **Install PyTorch with Mamba**:
   Use Mamba to install PyTorch and torchvision from the PyTorch channel:

       mamba install pytorch torchvision -c pytorch

### Verifying PyTorch Installation

To verify that PyTorch has been installed correctly, run the following Python code to check the functionality by creating a randomly initialized tensor:

    import torch
    x = torch.rand(5, 3)
    print(x)

The output should display a 5x3 matrix with random values, confirming that PyTorch is functioning properly on your system.

### Additional Tool: PyTorchViz

PyTorchViz is a small package that helps in creating visualizations of PyTorch execution graphs and traces, which can be helpful for debugging and understanding model architectures.

#### Installation

    pip install torchviz

## 8. Review of Python Basics  <a class="anchor" id="eight"></a>

To start with Python uses the following [numerical data types and operations](https://docs.python.org/2/library/stdtypes.html#boolean-operations-and-or-not)

We will encounter three main types of collections of data:

    Lists: a mutable array of data
    Tuples: ordered, immutable list
    Dictionaries: keyword/value lookup

It is worth noting that python begins indexing at 0 and uses [row-major order](https://en.wikipedia.org/wiki/Row-major_order), which means that pairs of values are written as (y,x) instead of (x,y).  This actually makes sense when one is dealing with tables of data.


Tuple

    its most notable property is that it is immutable – after being defined, it cannot be changed
    denoted by parentheses, e.g., x=(5.0,7.0,9.0,11.0)
    to index, use square brackets, e.g., print(x[0])
    can get part of one using :, e.g., try print(x[2:]) etc.

List

    in contrast to a tuple, it can be reassigned
    denoted with brackets, e.g,. y=[5.0,7.0,9.0,11.0]
    to index, also use square brackets, e.g., print(x[0])
    can get part of one using :, e.g., try print(y[2:]) etc.

Dictionary

    assigns a value to a key, for example z={'a':2,'b':4,'c':6}, where a,b,c are the keywords.
    the dictionary is indexed by the keyword, e.g., print(z['a'])
    they can be quite complicated.
    
Here is a lot more on data structures: [https://docs.python.org/2/tutorial/datastructures.html](https://docs.python.org/2/tutorial/datastructures.html).

### Methods and Attributes

Each type of data structure has associated “methods”. A method is like a little built-in function that can be run on a data structure.

For example, open a new cell (or in a python terminal window) and do the following:

```python
v=[27.0,35.0,101.0,57.0]
print(v)
v.sort()
print(v)
```

So v has now been replaced with the sorted v.

The parentheses indicate that this is a "method", which is something that can (but doesn't have to) take *arguments*.  An "attribute" on the other hand is just a property.  E.g., if you have an object called `shape` and shape contains information like `area`, `length`, `width`, etc., then you access those values as

```python
print(shape.area)
```

Basic mathematical operations that can be applied to your data are found here: [https://docs.python.org/2/library/math.html](https://docs.python.org/2/library/math.html).

OK, hopefully that is a refresher for everyone, so let's get going with more complicated stuff.
If you need more of a refresher than that, see the links above.

## Installing Python Data Science Packages with Mamba

Mamba is a fast and efficient package manager that simplifies the process of managing Python packages. In this section, we will use Mamba to install some of the most commonly used data science and machine learning packages: NumPy, Pandas, Matplotlib, SciPy, and astroML.

### Installing Packages

To install NumPy, Pandas, Matplotlib, SciPy, and astroML, run the following command in your terminal:

    mamba install numpy pandas matplotlib scipy astroml


### Verify Installation

After installing the packages, you can verify that they are correctly installed by importing them in a Python script or Jupyter notebook:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import scipy
    import astroML

If no errors are raised, it means the packages have been successfully installed and are ready for use.

### NumPy

NumPy is short for Numerical Python. It is the foundational package for scientific computing in Python. It is a library which will allow us to work with data structures called **arrays** that are more efficient for storing amd manipulating data than other Python data structures (or C++ for that matter). 

For example, cut and paste this into a new cell, then type shift-enter to run it:

```python
import numpy as np
xlist = [1,2,3]
xarray = np.array(xlist)
twice_xarray = 2*xarray
print(twice_xarray)
```

This is far better than creating a for loop over the entries in `xlist` or even something fancier like a "list comprehension"
```python
   twice_xlist = [2*x for x in xlist]
   print(twice_xlist)
```
Note the lack of commas in the array as compared to the list.

To load the Numpy library type:  

    In [ ]: import numpy as np
    
In fact, just plan on starting **every** notebook with this!    

The basic unit for numpy is an [ndarray](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html).  See the link for examples of how to define, index, slice, etc. the array.

There are quite a few **methods** associated with numpy arrays.  Some useful ones include:

|Method |	Property|
|-------|-----------|
|np.zeros(5,float) |	yields a 5-element array of zeros of type float |
|a=np.empty(4) |	yields a 4-element empty array |
|a.fill(5.5) |	fills that array with 5.5 for all elements |
|np.arange(5) |	yields an integer array of length 5 with increasing values | 
|b=np.random.normal(10,3,5) |	yields a 5 element array of normally distributed numbers with mean 10 and variance 3 |
|mask=b > 9 |	creates a boolean array determing which numbers are greater than 9 |
|print(b[mask]) |	prints the ones with values > 9 |
|b[mask]=0 | sets the ones > 9 to zero |

Arrays can be multidimensional, e.g., ```c=np.random.normal(10,3,(2,4))```, creates a 2 x 4 array with normally distributed numbers with mean 10 and variance 3.

More methods for multidimensional arrays:

|Method |	Property|
|-------|-----------|
|d=c[0,:] |	grabs the first (0th) row of c. |
|d=c[1,:] |	grabs the second (1st) row of c. |
|d=c[:,0] |	grabs the first column of c. |
|c.dtype |	data type |
|c.size |	total number of elements |
|c.ndim |	number of dimensions |
|c.shape |	shape or dimensionality |
|c.nbytes |	memory used (bytes) |
|c.min() |	gives the minimum of c |
|c.max() |	gives the maximum of c |
|c.sum() |	sum of all elements |
|c.mean() |	mean of all elements |
|c.std() |	standard deviation of all elements |
|c.sum(axis=0) | will present sum along the 0th axis (column direction). The result will have reduced dimensionality |

You can also operate with arrays, for example, adding them together, multiplying them, multiplying or adding a constant. There are, however, [**broadcasting**](https://numpy.org/doc/stable/user/basics.broadcasting.html) rules so that you need to make sure you know what you are doing when dealing with arrays of different sizes.

### SciPy


[SciPy](http://docs.scipy.org/doc/scipy/reference/) is an "open-source software for mathematics, science, and engineering".  We 'import' it the same way that we import numpy:


    In [ ]: import scipy as sp


SciPy is a suite of tools for data analysis including integration, statistical functions, interpolation.  It is built on top of Numpy. Where Numpy is intended for array *manipulation*, Scipy is intended for *analysis*. This is where the useful tools that are built into IDL (such as tsum, interpol, etc) will be.  It may be worth looking at the users guide to get an idea of the kinds of functions are available.   N.B.  For some packages you need to import more than just the main scipy package.  So, for example to see what is available in the scipy.integrate package, do the following:

```
import scipy.integrate
scipy.integrate?
```



### Pandas

[Pandas](https://pandas.pydata.org/) is an "is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.". 

We 'import' it the same way that we import numpy:


    In [ ]: import pandas as pd

Pandas is a library designed with manipulating numerical tables and time series in mind and is a very useful tool for data science.   We will be making use of Pandas [DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) in the course of this class. You can get some experience using Pandas either with [DataCamp](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python) or with [Chapter 3 of the Python Data Science Handbook by Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html).

```
import pandas as pd
pd?
```

### Scikit-learn

The [Scikit-learn library](http://scikit-learn.org/) forms the core of computing tools that we will use for this class.  It probably needs its own introductory tutorial as both the input and output may not be quite what you would have expected.    You might want to spend some time going through their quick start guide, user's guide and tutorials here: [http://scikit-learn.org/stable/documentation.html](http://scikit-learn.org/stable/documentation.html).


### Matplotlib Example

It is generally useful to be able to visualize your data.  We will do that using the 
[matplotlib](http://matplotlib.org/) library.   Most of you should be familiar with it already, but we will likely be making some plots that are more complex than you are used to.  One nice resource is this [thumbnail gallery](http://matplotlib.org/gallery.html) which you can use to figure out how to make a new plot.

Let's make sure that everything is working by making a simple plot

In [None]:
# magic command to make the figure pop up below instead of in a new window
%matplotlib inline 

# invoke pyplot in matplotlib, give it an alias
import matplotlib.pyplot as plt  
import numpy as np

x = np.linspace(0, 3*np.pi, 300)
y = np.sin(x)
plt.plot(x, y)
y2 = np.sin(x**2)
plt.plot(x, y, label=r'$\sin(x)$')
plt.plot(x, y2, label=r'$\sin(x^2)$')
plt.title('Some functions')
plt.xlabel('x')
plt.ylabel('y')
plt.grid()
plt.legend();

You can change both the [marker/line styles](http://matplotlib.org/examples/lines_bars_and_markers/) and [colors](http://matplotlib.org/api/colors_api.html), but I highly recommend the use of the colors in the [palettable library](https://jiffyclub.github.io/palettable/), where you can see the colors at [http://colorbrewer2.org/](http://colorbrewer2.org/).

Try making some changes to the code above and see what happens.

### AstroML Example

Now let's download and run one of the astroML examples inside the Jupyter notebook.

In [None]:
! curl -O https://raw.githubusercontent.com/astroML/astroML/master/examples/datasets/plot_SDSS_SSPP.py

In [None]:
%matplotlib inline
%run ./plot_SDSS_SSPP.py

## 9. Git Example  <a class="anchor" id="nine"></a>

Lastly, let's practice some git!

1. create a new repository on github, with a README file. Name it `PHYS_440_540_test`.

2. clone it using:
```bash
cd ~/git
git clone https://github.com/YOURUSERNAME/PHYS_440_540_test
```
3. start Jupyter in the cloned directory
```
cd PHYS_440_540_test
jupyter notebook &
```
4. create a new notebook. Name it `hello.ipynb` (with File:Rename), and add a cell with the following piece of code:
```python
print("Hello World!")
```
5. add the notebook to your git repository and commit by running (in the terminal window) the following:
```bash
git add hello.ipynb
git commit -m "Added hello.ipynb to repository."
```
6. see what happened:
```bash
git log
```
7. make another change in the Jupyter notebook. For example, add another cell ("+" icon on the toolbar) with the following:
```python
x = 2+2
print(x)
```
8. see what happened
```bash
git status
```
9. commit changed files
```
git commit -am "Updated hello.ipynb with complex mathematics."
```
10. Push the Changes to GitHub Using Personal Access Token

GitHub no longer supports password authentication for Git operations. Instead, you need to use a personal access token (PAT).

**Step 1: Generate a Personal Access Token**

    1. Go to GitHub and log in.
    2. Navigate to Settings > Developer settings > Personal access tokens > Tokens (classic).
    3. Click Generate new token.
    4. Name the token (e.g., "Git for PHYS_440_540_test").
    5. Set an expiration date or choose "No expiration".
    6. Under Select Scopes, check repo for full control over your repositories.
    7. Generate the token and copy it immediately (you won't be able to see it again).

**Step 2: Push the Changes Using the Token**

When you push changes to GitHub, you will use your GitHub username and the personal access token as your "password."

    Push the changes to GitHub:

        git push

When prompted for your username, enter your GitHub username:

    Username for 'https://github.com': YOURUSERNAME

When prompted for your password, paste the personal access token instead of your password:

    Password for 'https://YOURUSERNAME@github.com': <paste your token here>

11. Check the Changes on GitHub

    Open your browser and go to the GitHub repository page (e.g., https://github.com/YOURUSERNAME/PHYS_440_540_test).
    You should see the hello.ipynb file and the commits you made.

## Class Participation <a class="anchor" id="ten"></a>

To get full credit for class participation, you will be submitting this notebook (and that for every other lecture) with all of the cells filled in (each lecture will have a number of cells with blanks to provide a hands-on aspect to the course).

The instructions for doing this are the first assignment for [Homework 1](https://github.com/gtrichards/PHYS_440_540/blob/master/homeworks/PHYS_440_540_F24_HW1.ipynb), which you can also find in the /homeworks folder in notebook PHYS_440_540_F22_HW1.ipynb.  The other homeworks will be similarly named and in the same place.

## Summary

At this point, all of you should have:
* A `GitHub` account and a basic understanding of `git`
* Installed the Anaconda Python Distribution with Python 3
* Installed AstroML
* The ability to create and run Jupyter notebooks
* The ability to submit completed notebooks for class participation credit (once you have done Homework 1).

Now we're ready to do some astro-statistics!