### Usage of the Notebook and Importance of `username.csv`

This notebook is designed to assist in identifying online presence by leveraging Maigret, a powerful OSINT tool for username reconnaissance. The workflow includes fuzzing input usernames to generate variations, running Maigret to gather data from hundreds of online platforms, and filtering the results based on HTTP status codes to identify active profiles.

#### **Usage Instructions**
1. **Prepare the Input File:**
   - Create a file named `username.csv` containing the initial list of usernames you want to investigate.
   - Ensure the file includes a single column with the header `username`, listing one username per row.
   
2. **Upload the File:**
   - Before running the notebook, upload the `username.csv` file to the Colab environment using the file upload interface or via Google Drive integration.

3. **Run the Notebook:**
   - Follow the step-by-step cells in the notebook to process the usernames, generate fuzzed variations, and execute Maigret.

4. **Download Results:**
   - After processing, download the generated CSV containing filtered results for further analysis.

#### **Why Is `username.csv` Necessary?**
The `username.csv` file ensures that:
- **Scalability:** You can easily handle large lists of usernames without hardcoding them in the notebook.
- **Reusability:** The notebook can be reused for multiple investigations by simply replacing the file.
- **Error Prevention:** A structured input format reduces the likelihood of errors when providing usernames.
  
This approach makes the notebook versatile and user-friendly, especially for analysts working with bulk username data.

In [5]:
# Clone the Maigret repository
!git clone https://github.com/soxoj/maigret

# Navigate into the cloned directory and install Maigret
!pip3 install ./maigret/

fatal: destination path 'maigret' already exists and is not an empty directory.
Processing ./maigret
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: maigret
  Building wheel for maigret (pyproject.toml) ... [?25l[?25hdone
  Created wheel for maigret: filename=maigret-0.4.4-py3-none-any.whl size=181418 sha256=5fd01563b1024a9c1309f01bf362c8b34890a1b31a903d6c9fe4c5425cf8ba9b
  Stored in directory: /tmp/pip-ephem-wheel-cache-znvip5fs/wheels/31/94/e6/499efa002515c3fc090bd0c478ac07bfd11faacaeec3b9809e
Successfully built maigret
Installing collected packages: maigret
  Attempting uninstall: maigret
    Found existing installation: maigret 0.4.4
    Uninstalling maigret-0.4.4:
      Successfully uninstalled maigret-0.4.4
Successfully installed maigret-0.4.4


In [None]:
# Import necessary libraries
import pandas as pd
import re

# User-defined variable to specify the number of iterations for appending numbers
NUMERIC_ITERATIONS = 100  # Adjust this number to control the fuzzing range

# Function to fuzz usernames
def fuzz_username(username, iterations):
    """Generate fuzzed variations of a username based on OSINT principles."""
    # Split username into potential words, numbers, and special characters
    base_variants = [
        username.lower(),  # All lowercase
        username.upper(),  # All uppercase
        username.capitalize(),  # First letter capitalized
    ]

    # Simplified leetspeak substitution mapping
    leetspeak_map = str.maketrans("aeios", "43105")  # Ensures equal lengths
    base_variants.append(username.translate(leetspeak_map))  # Leetspeak version

    # Split on common delimiters or camel case and generate word combinations
    split_words = re.split(r'[_.\-]|(?<=[a-z])(?=[A-Z])', username)
    combined_variants = [
        ''.join(split_words),  # Remove delimiters
        '_'.join(split_words),  # Join with underscores
        '-'.join(split_words),  # Join with hyphens
    ]

    # Generate simple fuzzing variations
    number_variants = [f"{username}{i}" for i in range(iterations)]  # Append numbers
    reversed_variants = [username[::-1]]  # Reverse the username

    # Combine all generated variants
    all_variants = set(base_variants + combined_variants + number_variants + reversed_variants)
    return list(all_variants)

# Load the usernames dataset (replace with the path to your file in Google Colab)
from google.colab import files
uploaded = files.upload()

# Assuming the uploaded file has a column named 'username'
file_name = list(uploaded.keys())[0]
usernames_data = pd.read_csv(file_name)

# Apply the fuzzer to the dataset
usernames_data['fuzzed_usernames'] = usernames_data['username'].apply(
    lambda x: fuzz_username(x, NUMERIC_ITERATIONS)
)

# Expand the fuzzed usernames for easier analysis
expanded_fuzzed_usernames = usernames_data.explode('fuzzed_usernames')

# Display the resulting fuzzed usernames
from IPython.display import display
display(expanded_fuzzed_usernames)

# Optionally, save the fuzzed usernames to a new CSV file
output_file_name = "fuzzed_usernames.csv"
expanded_fuzzed_usernames.to_csv(output_file_name, index=False)
print(f"Fuzzed usernames saved to {output_file_name}")


Saving usernames.csv to usernames (1).csv


Unnamed: 0,username,fuzzed_usernames
0,yberFox123,yberFox12317
0,yberFox123,yberFox1239
0,yberFox123,yberFox12335
0,yberFox123,yberFox12367
0,yberFox123,yberFox12354
...,...,...
9,IronEagle99,IronEagle9943
9,IronEagle99,IronEagle997
9,IronEagle99,IronEagle9971
9,IronEagle99,IronEagle9989


Fuzzed usernames saved to fuzzed_usernames.csv


In [15]:
import pandas as pd
import os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed

# Load the fuzzed usernames from the CSV
input_file = "fuzzed_usernames.csv"
fuzzed_data = pd.read_csv(input_file)

# User-defined parameters
GROUP_SIZE = 50
OUTPUT_DIR = "maigret_results"
MAX_WORKERS = 10  # Number of threads to run in parallel

# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Split usernames into manageable groups
unique_usernames = fuzzed_data['fuzzed_usernames'].drop_duplicates().dropna()
groups = [unique_usernames[i:i+GROUP_SIZE] for i in range(0, len(unique_usernames), GROUP_SIZE)]

# Function to run Maigret for a single username
def run_maigret(username, group_output_file):
    command = f"maigret {username} --csv --folderoutput {group_output_file}"
    try:
        subprocess.run(command, shell=True, check=True)
        return f"Success: {username}"
    except subprocess.CalledProcessError as e:
        return f"Error: {username}, {e}"

# Process groups with threading
for group_idx, group in enumerate(groups):
    group_output_file = os.path.join(OUTPUT_DIR, f"group_{group_idx+1}.csv")
    print(f"Processing Group {group_idx+1} with {len(group)} usernames...")

    # Use ThreadPoolExecutor to process usernames in parallel
    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futures = {executor.submit(run_maigret, username, group_output_file): username for username in group}

        for future in as_completed(futures):
            username = futures[future]
            try:
                result = future.result()
                print(result)
            except Exception as exc:
                print(f"Exception occurred for {username}: {exc}")

print(f"All groups processed. Results saved in '{OUTPUT_DIR}' directory.")


Processing Group 1 with 50 usernames...
Success: yberFox1239
Success: yberFox12319
Success: yberFox12345
Success: yberFox12354
Success: yberFox12326
Success: yberFox12367
Success: yberFox12362
Success: yberFox12338
Success: yberFox12335
Success: yberFox12317
Success: yberFox12311
Success: yberFox12334
Success: yberFox12365
Success: yberFox1233
Success: yberFox12393
Success: yberFox12328
Success: yberFox1238
Success: yberFox12382
Success: yberFox12340
Success: yberFox12329
Success: yberFox12355
Success: yberFox12364
Success: yberFox12375
Success: yberFox12321
Success: yberFox12356
Success: yberFox12339
Success: yberFox12315
Success: yberFox12377
Success: yberFox12359
Success: yberFox12346
Success: yberFox1230
Success: yberFox12363
Success: yberFox12376
Success: yberFox12310
Success: yberFox12398
Success: yberFox12330
Success: yberFox1236
Success: yberFox12347
Success: yber_Fox123
Success: yberFox12379
Success: yberFox12323
Success: yberFox12391
Success: yberFox12348
Success: yberFox1238

In [17]:
import pandas as pd
import os
import glob

# Directory containing batch results
OUTPUT_DIR = "maigret_results"
MASTER_CSV = "master_results.csv"
FILTERED_CSV = "filtered_results.csv"

# Combine all batch results into a master CSV
print("Combining batch results into a master CSV...")
all_files = glob.glob(os.path.join(OUTPUT_DIR, "**", "*.csv"), recursive=True)  # Recursively find all CSV files

# Filter out directories to avoid errors
all_files = [file for file in all_files if os.path.isfile(file)]

all_dataframes = []
for file in all_files:
    try:
        df = pd.read_csv(file)
        all_dataframes.append(df)
    except Exception as e:
        print(f"Error reading {file}: {e}")

if all_dataframes:
    master_df = pd.concat(all_dataframes, ignore_index=True)
    master_df.to_csv(MASTER_CSV, index=False)
    print(f"Master CSV saved to {MASTER_CSV}")
else:
    print("No batch files found. Ensure the OUTPUT_DIR contains CSV files.")
    exit()

# Filter rows with HTTP code 200
print("Filtering results with HTTP status code 200...")
if "http_status" in master_df.columns:
    filtered_df = master_df[master_df['http_status'] == 200]
    filtered_df.to_csv(FILTERED_CSV, index=False)
    print(f"Filtered results saved to {FILTERED_CSV}")
else:
    print("The 'http_status' column is missing in the results. Ensure Maigret output includes HTTP codes.")


Combining batch results into a master CSV...
Master CSV saved to master_results.csv
Filtering results with HTTP status code 200...
Filtered results saved to filtered_results.csv
