# EMR SSH Access + HDFS Basics


## Download the PEM key

Download **`emr_training1.pem`** from the shared folder and save it locally.

### Recommended location
- **macOS/Linux**: `~/Downloads/emr_training1.pem`

You will use this file to authenticate your SSH connection to the EMR node.


In [None]:
# Download emr_training1.pem from the shared folder (manual step)

## Secure the PEM file (permissions)

SSH requires the private key to be readable only by you.

On **macOS/Linux**, use `chmod 600`:

- `600` means: **read/write for owner only**


In [None]:
chmod 600 emr_training1.pem

## Connect to the EMR primary node using SSH

Connect as the `hadoop` user to the EMR primary node:

- Make sure the PEM path is correct for your machine.


In [None]:
ssh -i emr_training1.pem hadoop@18.142.114.243

## Verify Hadoop installation

After logging in, confirm Hadoop is installed and available.


In [None]:
hadoop version

## HDFS basics: list files

`hadoop fs` is the CLI for **HDFS** operations.

- `-ls` lists files and directories in HDFS.


In [None]:
hadoop fs -ls

## Create your own HDFS folder

Create a personal folder under `/user/hadoop/`.

Replace `<your_name>` with your name (no spaces recommended).
Example: `/user/hadoop/nagabhushan`


In [None]:
hadoop fs -mkdir /user/hadoop/<your_name>

hadoop fs -mkdir /user/hadoop/<your_name>/stocks_data

## Upload a single CSV file to HDFS

Upload `nyse_sample_data.csv` to `/user/hadoop/`.

- `-put` copies from **local filesystem** â†’ **HDFS**


In [None]:

 hadoop fs -put nyse_sample_data.csv /user/hadoop/<your_name>/stocks_data/
 

## Read the uploaded file using `cat` command

In [None]:
hdfs dfs -cat /user/hadoop/<your_name>/stocks_data/nyse_sample_data.csv

## Upload the `retail_db/` folder to your HDFS folder

This copies the entire `retail_db/` directory (including subfolders) into your HDFS location.


In [None]:
hadoop fs -put retail_db/ /user/hadoop/<your_name>

## Preview file contents from HDFS

### Read the full file (`cat`)
Use `-cat` to print the entire file to the terminal (best for small files).


In [None]:
hadoop fs -cat /user/hadoop/<your_name>/retail_db/orders/part-00000

### View last lines (`tail`)
Use `-tail` to view the last part of a file.


In [None]:
hadoop fs -tail /user/hadoop/<your_name>/retail_db/customers/part-00000

### View first lines (`head`)
Use `-head` to view the first part of a file (useful for header/schema preview).


In [None]:
hadoop fs -head /user/hadoop/<your_name>/retail_db/order_items/part-00000