# Exploring AI & ML Job Trends in the U.S.

## Notebook Version: v1  
**Focus**: Dataset loading and basic structural preview  

This notebook is part of a versioned project exploring trends in AI/ML job postings in the U.S.  
This version focuses on loading the dataset, checking its structure, and identifying surface-level issues.


In [6]:
#importing the necessary libraries
import numpy as np 
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/ai-and-ml-job-listings-usa/ai_ml_jobs_linkedin.csv


## Dataset Overview

- Source: Kaggle – AI and ML Job Listings USA  
- File path: `/kaggle/input/ai-and-ml-job-listings-usa/ai_ml_jobs_linkedin.csv

## Load and Preview Data

Loading the dataset into a DataFrame and preview the structure to understand its basic layout.


In [7]:
# Load the dataset
us_jobs_df = pd.read_csv('/kaggle/input/ai-and-ml-job-listings-usa/ai_ml_jobs_linkedin.csv')

# Create a working copy
jobs_df = us_jobs_df.copy()

In [8]:
# Preview first 2 rows
jobs_df.head(2)


Unnamed: 0,title,location,publishedAt,companyName,description,applicationsCount,contractType,experienceLevel,workType,sector
0,AI/ML Engineer,"New York, NY",2024-05-29,Wesper,THE OPPORTUNITY\n\nWesper is looking for a sma...,Over 200 applicants,Full-time,Mid-Senior level,Engineering and Information Technology,Internet Publishing
1,Software Engineer - AI/ML Systems,"Redwood City, CA",,Snorkel AI,We're on a mission to democratize AI by buildi...,51 applicants,Full-time,Entry level,Engineering and Information Technology,Software Development


In [9]:
# Check dataset shape
print(f"Rows: {jobs_df.shape[0]}, Columns: {jobs_df.shape[1]}")

# Data types and non-null info
jobs_df.info()

Rows: 862, Columns: 10
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 862 entries, 0 to 861
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   title              862 non-null    object
 1   location           862 non-null    object
 2   publishedAt        850 non-null    object
 3   companyName        861 non-null    object
 4   description        862 non-null    object
 5   applicationsCount  862 non-null    object
 6   contractType       862 non-null    object
 7   experienceLevel    862 non-null    object
 8   workType           862 non-null    object
 9   sector             859 non-null    object
dtypes: object(10)
memory usage: 67.5+ KB


In [12]:
# Summary stats for numeric columns
jobs_df.describe()


Unnamed: 0,title,location,publishedAt,companyName,description,applicationsCount,contractType,experienceLevel,workType,sector
count,862,862,850,861,862,862,862,862,862,859
unique,450,164,142,519,748,145,5,7,55,156
top,Machine Learning Engineer,United States,2024-05-22,"Unreal Staffing, Inc",Grammarly is excited to offer a remote-first h...,Over 200 applicants,Full-time,Mid-Senior level,Engineering and Information Technology,Software Development
freq,146,140,136,45,17,371,744,403,557,197


## Initial Observations and Notes

- The dataset contains **862 rows** and **10 columns**.
- Some columns such as `companyName`, `publishedAt`, and `sector` contain missing values.
- Columns like `applicationsCount` and `publishedAt` may need data type conversions in the next version.
- No immediate data loading issues were encountered.

## Next Steps

In the upcoming version:
- Handle missing values
- Convert data types (e.g., to datetime)
- Clean or rename columns where necessary
