# Ungraded Lab: GitHub Integration Lab

## Overview 
Welcome to EngageMetrics' GitHub Integration Lab! In this hands-on session, you'll learn to manage version control for data science projects using GitHub. You'll work with the Employee Insights dataset, creating and maintaining a repository for your analysis notebook while learning best practices for version control in data science workflows.
If you get stuck or need a refresher, refer back to the screencast where we demonstrated these concepts step-by-step. The video shows similar examples that can help guide you through this activity.

## Learning Outcomes 
By the end of this lab, you will be able to:
- Initialize and structure a Git repository for a data science project
- Write clear, descriptive commit messages following data science best practices
- Push changes to GitHub while managing notebook metadata
- Implement version control best practices for Jupyter notebooks

## Dataset Information 
You'll be working with the <b>employee_insights_cleaned.csv</b> dataset from EngageMetrics, containing employee satisfaction scores, work modes, and performance metrics.

## Activities
### Activity 1: Repository Setup and Initial Commit

<b>Step 1:</b> Initialize the Repository

In [None]:
!mkdir engagemetrics_analysis
%cd engagemetrics_analysis
!mv ../employee_analysis.ipynb .

<b>Step 2:</b> Initialize Git Repository

In [None]:
!git init

### Activity 2: First Analysis and Commit

<b>Step 1:</b> Create Basic Analysis

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('../employee_insights_cleaned.csv')

# Display basic information and statistics  
# YOUR CODE HERE

<b>Step 2:</b> Stage and Commit : 
- Open terminal or command prompt from “Launcher”. In terminal, ensure that you are in the correct directory, <b>engagemetrics_analysis</b> by typing <b>“pwd”</b> and pressing enter.
- Once in the correct directory, use the following commands to stage and commit the file : 

In [None]:
# While using git commands in terminal we do not need to use "!" as prefix. 
git add employee_analysis.ipynb
git commit -m "Initial analysis: Add dataset overview and basic statistics"

<b>Tip:</b> Write commit messages that clearly describe what analysis or changes were made.

### Activity 3: Remote Repository Setup

<b>Step 1:</b> Create GitHub Repository
- Visit github.com and create a new repository named "engagemetrics-analysis"
- Follow GitHub's instructions to connect your local repository

<b>Step 2:</b> Push Your Changes using terminal (Again ensure that you are in engagemetrics_analysis directory).


In [None]:
git remote add origin <your-repository-url>
git branch -M main

You will need to authenticate your credentials to push the commited changes to the repository. Keep your username and personal access token ready before pushing to main branch.

In [None]:
git push -u origin main

## Success Checklist
- Repository is properly initialized with .gitignore
- Initial analysis notebook is committed and pushed
- Commit messages are clear and descriptive
- Remote repository is properly connected

## Common Issues & Solutions 
- Problem: Jupyter notebook metadata causing conflicts 
    - Solution: Clear output cells before committing
- Problem: Push rejected due to remote changes 
    - Solution: Pull changes first with git pull origin main
    
## Summary 
Fantastic work! You've mastered the essential skills of managing a data science project using Git and GitHub! You can now confidently take your projects from initial setup to deploying changes on GitHub, setting you up for success in collaborative data science work. Remember, these version control skills are crucial for professional data scientists - you're now equipped with industry-standard practices that will serve you well throughout your career!

### Key Points
- Initialize repositories before starting any data analysis
- Write clear, descriptive commit messages that explain your changes
- Keep commits focused and atomic for better version history
- Always use .gitignore to manage notebook metadata and cache files
- Pull before pushing to avoid conflicts in collaborative work

## Solution Code
Stuck on your code or want to check your solution? Here's a complete reference implementation to guide you. This represents just one effective approach—try solving independently first, then use this to overcome obstacles or compare techniques. The solution is provided to help you move forward and explore alternative approaches to achieve the same results. Happy coding!

### Activity 1: Repository Setup and Initial Commit - Solution Code

In [None]:
# Step 1: Initialize the Repository
!mkdir engagemetrics_analysis
%cd engagemetrics_analysis
!mv ../employee_analysis.ipynb .


# Step 2: Initialize Git Repository
!git init

# Verify initialization
!git status

### Activity 2: First Analysis and Commit - Solution Code

In [None]:
# Step 1: Create Basic Analysis

# Initial Jupyter Notebook Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('../employee_insights_cleaned.csv')

# Display basic information
print("Dataset Overview:")
print(df.info())

# Basic statistics
print("\nBasic Statistics:")
print(df.describe())

# Check for missing values
print("\nMissing Values:")
print(df.isnull().sum())

"""
# Step 2: Stage and Commit
# Terminal commands:
(Remove "!" while using these commands in terminal) 
!git add employee_analysis.ipynb
!git commit -m "Initial analysis: Add dataset overview and basic statistics"
"""

### Activity 3: Remote Repository Setup - Solution Code

In [None]:
# Step 1: Create GitHub Repository
# 1. Go to github.com
# 2. Click '+ New repository'
# 3. Name it 'engagemetrics-analysis'
# 4. Leave it public
# 5. Don't initialize with README

"""
# Step 2: Push Your Changes
# Terminal commands:
(Remove "!" while using these commands in terminal)
!git remote add origin <your-repository-url>
!git branch -M main
!git push -u origin main
"""