# LAB 2: Processing Raw DNA Profiles\n\n## Browser-Based Version\n\nThis notebook is a simplified version of the full Lab 2 notebook, adapted to run in your browser using JupyterLite.\n\n### Key Learning Objectives:\n\n1. Learn how raw DNA profiles are structured and stored\n2. Understand the process of converting raw DNA data to standardized formats\n3. Explore how to determine biological sex from genetic data\n4. Visualize the key steps in preparing genetic data for analysis\n\n### Browser vs. Local Environment\n\nThis browser version uses pre-processed sample data to demonstrate the key concepts without requiring file system access or specialized bioinformatics tools. For the full experience with your own genetic data, download the complete notebook from the course materials and run it in your local Python environment.

In [ ]:
# Import libraries\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom io import StringIO\n\n# Display settings\npd.set_option('display.max_columns', 10)\npd.set_option('display.max_rows', 10)\n\nprint(\"Libraries imported successfully!\")

## Understanding Raw DNA Profile Formats\n\nConsumer DNA testing companies like 23andMe, AncestryDNA, and others provide raw data in various formats. In this section, we'll explore what a raw DNA profile looks like and the common formats you might encounter.

In [ ]:
# Sample raw DNA data snippets from different consumer testing companies\n\n# AncestryDNA format example\nancestry_data = \"\"\"#AncestryDNA raw data download\n#This file was generated by AncestryDNA at: 2023-01-01 13:45:00 UTC\n#Data version: 2022-10-01\n\n#Full id,chromosome,position,allele1,allele2\nrs11240777,1,798959,G,G\nrs6681049,1,800007,C,T\nrs4970383,1,838555,A,A\nrs12562034,1,846808,G,G\nrs3934834,1,995669,C,T\nrs9442372,1,1011278,G,A\nrs3737728,1,1061531,G,A\nrs11260588,1,1061928,C,C\nrs9651273,1,1072319,A,G\nrs12726255,1,1086864,G,A\n\"\"\"\n\n# 23andMe format example\n23andme_data = \"\"\"# This data file generated by 23andMe at: Tue Jan 1 13:45:00 2023\n# This file contains raw genotype data, including data that is not used in 23andMe reports.\n# This data has undergone a general quality review however only a subset of markers have been \n# individually validated for accuracy. As such, this data is suitable only for research, \n# educational, and informational use and not for medical or other use.\n\n# Below is a text version of your data. Fields are tab-separated\n# Each line corresponds to a single SNP.  For each SNP, we provide its identifier\n# (an rsid or an internal id), its location on the reference human genome, and the \n# genotype call oriented with respect to the plus strand on the human reference genome.\n# We are using reference human assembly GRCh37.p13 (also known as Genome Reference\n# Consortium Human Build 37 patch release 13).\n\n# rsid\tchromosome\tposition\tgenotype\nrs548049170\t1\t13302\tTT\nrs13328684\t1\t45411\tCC\nrs13301156\t1\t45431\tGG\nrs147121693\t1\t55299\tCG\nrs557514207\t1\t81080\tAA\nrs3107975\t1\t103380\tAA\nrs4970384\t1\t126930\tAA\nrs11807848\t1\t182550\tGG\nrs28714670\t1\t260294\tGG\nrs4475691\t1\t318699\tCC\n\"\"\"\n\n# Print samples of each format\nprint(\"AncestryDNA Format Sample:\")\nprint(ancestry_data[:500])\n\nprint(\"\\n23andMe Format Sample:\")\nprint(23andme_data[:500])

## Parsing Raw DNA Profiles\n\nThe first step in processing raw DNA data is parsing the file and standardizing the format. Let's demonstrate how to parse raw DNA files from different testing companies into a standardized structure.

In [ ]:
# Function to parse AncestryDNA format\ndef parse_ancestry_data(data_string):\n    # Skip header lines (lines starting with #)\n    lines = data_string.strip().split('\\n')\n    header_end = 0\n    for i, line in enumerate(lines):\n        if line.startswith('#Full id'):\n            header_end = i\n            break\n    \n    # Create DataFrame from the data\n    data_lines = lines[header_end+1:]\n    data = [line.split(',') for line in data_lines]\n    df = pd.DataFrame(data, columns=['rsid', 'chromosome', 'position', 'allele1', 'allele2'])\n    \n    # Convert position to integer\n    df['position'] = df['position'].astype(int)\n    \n    # Create genotype column\n    df['genotype'] = df['allele1'] + df['allele2']\n    \n    return df\n\n# Function to parse 23andMe format\ndef parse_23andme_data(data_string):\n    # Skip header lines (lines starting with #)\n    lines = data_string.strip().split('\\n')\n    data_lines = []\n    for line in lines:\n        if not line.startswith('#'):\n            data_lines.append(line)\n    \n    # Create DataFrame from the data\n    data = [line.split('\\t') for line in data_lines]\n    df = pd.DataFrame(data, columns=['rsid', 'chromosome', 'position', 'genotype'])\n    \n    # Convert position to integer\n    df['position'] = df['position'].astype(int)\n    \n    # Extract alleles from genotype\n    df['allele1'] = df['genotype'].str[0]\n    df['allele2'] = df['genotype'].str[1]\n    \n    return df\n\n# Parse the sample data\nancestry_df = parse_ancestry_data(ancestry_data)\ntwentythree_df = parse_23andme_data(23andme_data)\n\n# Display the parsed data\nprint(\"Parsed AncestryDNA Data:\")\ndisplay(ancestry_df)\n\nprint(\"\\nParsed 23andMe Data:\")\ndisplay(twentythree_df)

## Standardizing DNA Data Formats\n\nDifferent testing companies use different reference genome builds (e.g., GRCh37/hg19 vs GRCh38/hg38). For genetic genealogy analyses, we need to ensure all data is mapped to the same reference build. Let's explore the conversion process.

In [ ]:
# Load a sample of GRCh37 to GRCh38 coordinate mapping data\nliftover_data = \"\"\"#GRCh37_pos,GRCh38_pos,rsid\n799739,798959,rs11240777\n800817,800007,rs6681049\n838555,838555,rs4970383\n847000,846808,rs12562034\n996062,995669,rs3934834\n1011478,1011278,rs9442372\n1062636,1061531,rs3737728\n1063130,1061928,rs11260588\n1072522,1072319,rs9651273\n1087069,1086864,rs12726255\n\"\"\"\n\n# Parse the liftover data into a DataFrame\nliftover_lines = liftover_data.strip().split('\\n')\nheader = liftover_lines[0].replace('#', '').split(',')\ndata_lines = [line.split(',') for line in liftover_lines[1:]]\nliftover_df = pd.DataFrame(data_lines, columns=header)\n\n# Convert positions to integers\nliftover_df['GRCh37_pos'] = liftover_df['GRCh37_pos'].astype(int)\nliftover_df['GRCh38_pos'] = liftover_df['GRCh38_pos'].astype(int)\n\n# Display the liftover data\nprint(\"GRCh37 to GRCh38 Liftover Data:\")\ndisplay(liftover_df)\n\n# Create a dictionary mapping rsIDs to their GRCh38 positions\nrsid_to_grch38 = dict(zip(liftover_df['rsid'], liftover_df['GRCh38_pos']))\n\n# Function to convert a genotype DataFrame from GRCh37 to GRCh38\ndef convert_to_grch38(df, mapping_dict):\n    # Create a copy of the original DataFrame\n    converted_df = df.copy()\n    \n    # Only update positions for RSIDs that are in our mapping dictionary\n    mask = converted_df['rsid'].isin(mapping_dict.keys())\n    \n    # Store the original positions\n    converted_df['original_position'] = converted_df['position']\n    \n    # Update positions using the mapping dictionary\n    for idx, row in converted_df[mask].iterrows():\n        converted_df.at[idx, 'position'] = mapping_dict[row['rsid']]\n    \n    # Add a column indicating which genome build each position is on\n    converted_df['genome_build'] = 'GRCh37'\n    converted_df.loc[mask, 'genome_build'] = 'GRCh38'\n    \n    return converted_df\n\n# Apply the conversion to our sample 23andMe data (which is typically on GRCh37)\nconverted_df = convert_to_grch38(twentythree_df, rsid_to_grch38)\n\n# Display the converted data\nprint(\"\\nConverted 23andMe Data (GRCh37 to GRCh38):\")\ndisplay(converted_df)

## Determining Biological Sex from Genetic Data\n\nAn important step in processing raw DNA profiles is determining the biological sex of the individual. This can be done by analyzing certain patterns in the X and Y chromosomes. Let's demonstrate this process using simulated data.

In [ ]:
# Create simulated chromosome X and Y data for male and female samples\n\n# Simulate X chromosome SNPs for male (fewer heterozygous calls)\nmale_x_data = \"\"\"\nrsid,chromosome,position,allele1,allele2\nrs11111111,X,10000,A,A\nrs22222222,X,20000,G,G\nrs33333333,X,30000,C,C\nrs44444444,X,40000,T,T\nrs55555555,X,50000,G,G\nrs66666666,X,60000,A,A\nrs77777777,X,70000,C,C\nrs88888888,X,80000,T,T\nrs99999999,X,90000,G,G\nrs10101010,X,100000,A,A\n\"\"\"\n\n# Simulate X chromosome SNPs for female (more heterozygous calls)\nfemale_x_data = \"\"\"\nrsid,chromosome,position,allele1,allele2\nrs11111111,X,10000,A,G\nrs22222222,X,20000,G,A\nrs33333333,X,30000,C,C\nrs44444444,X,40000,T,C\nrs55555555,X,50000,G,G\nrs66666666,X,60000,A,G\nrs77777777,X,70000,C,T\nrs88888888,X,80000,T,T\nrs99999999,X,90000,G,A\nrs10101010,X,100000,A,A\n\"\"\"\n\n# Simulate Y chromosome SNPs for male (has Y chromosome calls)\nmale_y_data = \"\"\"\nrsid,chromosome,position,allele1,allele2\nrs111Y1111,Y,10000,A,A\nrs222Y2222,Y,20000,G,G\nrs333Y3333,Y,30000,C,C\nrs444Y4444,Y,40000,T,T\nrs555Y5555,Y,50000,G,G\n\"\"\"\n\n# Female would have no or very few Y chromosome SNPs, so an empty dataset\nfemale_y_data = \"\"\"\nrsid,chromosome,position,allele1,allele2\n\"\"\"\n\n# Function to parse the simulated data\ndef parse_chromosome_data(data_string):\n    if data_string.strip() == \"\"\"rsid,chromosome,position,allele1,allele2\\n\"\"\":\n        return pd.DataFrame(columns=['rsid', 'chromosome', 'position', 'allele1', 'allele2'])\n    \n    lines = data_string.strip().split('\\n')\n    data_lines = lines[1:] if lines[0].startswith('rsid') else lines\n    data = [line.split(',') for line in data_lines if line]\n    df = pd.DataFrame(data, columns=['rsid', 'chromosome', 'position', 'allele1', 'allele2'])\n    \n    if not df.empty:\n        df['position'] = df['position'].astype(int)\n        df['genotype'] = df['allele1'] + df['allele2']\n    \n    return df\n\n# Parse the data\nmale_x_df = parse_chromosome_data(male_x_data)\nmale_y_df = parse_chromosome_data(male_y_data)\nfemale_x_df = parse_chromosome_data(female_x_data)\nfemale_y_df = parse_chromosome_data(female_y_data)\n\n# Function to determine sex based on X heterozygosity and Y chromosome presence\ndef determine_sex(x_df, y_df, heterozygous_x_threshold=0.2, y_snp_threshold=3):\n    # Calculate X chromosome heterozygosity\n    total_x_snps = len(x_df)\n    het_x_snps = x_df[x_df['allele1'] != x_df['allele2']].shape[0]\n    het_x_ratio = het_x_snps / total_x_snps if total_x_snps > 0 else 0\n    \n    # Count Y chromosome SNPs\n    y_snp_count = len(y_df)\n    \n    # Determine sex based on thresholds\n    if het_x_ratio > heterozygous_x_threshold and y_snp_count < y_snp_threshold:\n        return \"Female\", {\n            \"X_heterozygosity\": het_x_ratio,\n            \"X_total_SNPs\": total_x_snps,\n            \"X_het_SNPs\": het_x_snps,\n            \"Y_SNP_count\": y_snp_count\n        }\n    elif het_x_ratio < heterozygous_x_threshold and y_snp_count >= y_snp_threshold:\n        return \"Male\", {\n            \"X_heterozygosity\": het_x_ratio,\n            \"X_total_SNPs\": total_x_snps,\n            \"X_het_SNPs\": het_x_snps,\n            \"Y_SNP_count\": y_snp_count\n        }\n    else:\n        return \"Undetermined\", {\n            \"X_heterozygosity\": het_x_ratio,\n            \"X_total_SNPs\": total_x_snps,\n            \"X_het_SNPs\": het_x_snps,\n            \"Y_SNP_count\": y_snp_count\n        }\n\n# Determine sex for our simulated samples\nmale_sex, male_stats = determine_sex(male_x_df, male_y_df)\nfemale_sex, female_stats = determine_sex(female_x_df, female_y_df)\n\n# Display results\nprint(\"Sample 1 Determination:\")\nprint(f\"Sex: {male_sex}\")\nprint(f\"Statistics: {male_stats}\")\n\nprint(\"\\nSample 2 Determination:\")\nprint(f\"Sex: {female_sex}\")\nprint(f\"Statistics: {female_stats}\")

## Visualizing the Sex Determination Process\n\nLet's create visualizations to better understand how the sex determination process works based on X chromosome heterozygosity and Y chromosome SNP counts.

In [ ]:
# Create simulated data for multiple samples\nimport random\n\n# Generate synthetic data\nrandom.seed(42)  # For reproducibility\n\n# Create sample data\nsample_data = []\n\n# Generate 50 synthetic female samples\nfor i in range(50):\n    x_het = random.uniform(0.25, 0.5)  # Higher X heterozygosity\n    y_count = random.randint(0, 2)     # Few or no Y SNPs\n    sample_data.append({\n        'sample_id': f'F{i+1}',\n        'X_heterozygosity': x_het,\n        'Y_SNP_count': y_count,\n        'actual_sex': 'Female'\n    })\n\n# Generate 50 synthetic male samples\nfor i in range(50):\n    x_het = random.uniform(0, 0.15)     # Lower X heterozygosity\n    y_count = random.randint(4, 20)     # More Y SNPs\n    sample_data.append({\n        'sample_id': f'M{i+1}',\n        'X_heterozygosity': x_het,\n        'Y_SNP_count': y_count,\n        'actual_sex': 'Male'\n    })\n\n# Add a few ambiguous samples\nfor i in range(5):\n    x_het = random.uniform(0.15, 0.25)  # Borderline X heterozygosity\n    y_count = random.randint(2, 4)      # Borderline Y SNP count\n    sample_data.append({\n        'sample_id': f'U{i+1}',\n        'X_heterozygosity': x_het,\n        'Y_SNP_count': y_count,\n        'actual_sex': 'Undetermined'\n    })\n\n# Convert to DataFrame\nsample_df = pd.DataFrame(sample_data)\n\n# Apply our sex determination function to each sample\nsample_df['predicted_sex'] = sample_df.apply(\n    lambda row: determine_sex(\n        pd.DataFrame([{'allele1': 'A', 'allele2': 'G'}] * int(row['X_heterozygosity'] * 100)), \n        pd.DataFrame([{'rsid': 'rs1'}] * row['Y_SNP_count'])\n    )[0], \n    axis=1\n)\n\n# Display a sample of the data\nprint(\"Synthetic Sample Data:\")\ndisplay(sample_df.head(10))\n\n# Create a scatter plot to visualize the sex determination\nplt.figure(figsize=(10, 6))\n\n# Plot each sex group with different colors\nfor sex, color in zip(['Male', 'Female', 'Undetermined'], ['blue', 'red', 'purple']):\n    subset = sample_df[sample_df['actual_sex'] == sex]\n    plt.scatter(subset['X_heterozygosity'], subset['Y_SNP_count'], c=color, label=sex, alpha=0.7)\n\n# Add decision boundaries\nplt.axhline(y=3, color='gray', linestyle='--', alpha=0.7)  # Y chromosome threshold\nplt.axvline(x=0.2, color='gray', linestyle='--', alpha=0.7)  # X heterozygosity threshold\n\n# Add labels for the regions\nplt.text(0.05, 10, 'Male Region', fontsize=12, ha='center')\nplt.text(0.35, 1, 'Female Region', fontsize=12, ha='center')\n\n# Plot formatting\nplt.xlabel('X Chromosome Heterozygosity Ratio')\nplt.ylabel('Y Chromosome SNP Count')\nplt.title('Sex Determination Based on X Heterozygosity and Y SNP Count')\nplt.legend()\nplt.grid(True, alpha=0.3)\n\nplt.tight_layout()\nplt.show()\n\n# Plot mismatched samples (predicted sex != actual sex)\nmismatched = sample_df[sample_df['predicted_sex'] != sample_df['actual_sex']]\nprint(f\"\\nNumber of mismatched samples: {len(mismatched)}\")\nif len(mismatched) > 0:\n    display(mismatched)

## Converting to VCF Format\n\nAfter standardizing and processing raw DNA profiles, the final step is to convert them to Variant Call Format (VCF). VCF is the standard format used in most genetic analysis tools. In the local environment, this is done using bcftools, but here we'll illustrate the conceptual process.

In [ ]:
# Function to convert our standardized DataFrame to a simplified VCF format\ndef create_vcf_structure(df, sample_id):\n    # Create the VCF header\n    vcf_header = [\n        '##fileformat=VCFv4.2',\n        '##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">',\n        '##reference=GRCh38',\n        f'##source=Converted from raw data for sample {sample_id}',\n        '#CHROM\\tPOS\\tID\\tREF\\tALT\\tQUAL\\tFILTER\\tINFO\\tFORMAT\\t' + sample_id\n    ]\n    \n    # Create VCF records from the DataFrame\n    vcf_records = []\n    for _, row in df.iterrows():\n        chrom = row['chromosome']\n        pos = row['position']\n        rsid = row['rsid']\n        \n        # Since we don't have reference information in our simplified example,\n        # we'll use the first allele as REF and the second as ALT if they differ\n        ref = row['allele1']\n        alt = row['allele2'] if row['allele1'] != row['allele2'] else '.'\n        \n        # Set genotype - 0 for REF, 1 for ALT\n        if alt == '.':\n            # Homozygous reference\n            gt = '0/0'\n        elif row['allele1'] == row['allele2']:\n            # Homozygous alternate\n            gt = '1/1'\n        else:\n            # Heterozygous\n            gt = '0/1'\n        \n        # Create the VCF record\n        record = f\"{chrom}\\t{pos}\\t{rsid}\\t{ref}\\t{alt}\\t.\\tPASS\\t.\\tGT\\t{gt}\"\n        vcf_records.append(record)\n    \n    # Combine header and records\n    vcf_content = '\\n'.join(vcf_header + vcf_records)\n    return vcf_content\n\n# Create a simplified VCF for our AncestryDNA sample\nancestry_vcf = create_vcf_structure(ancestry_df, 'AncestryDNA_sample')\n\n# Display the first 20 lines of the VCF\nprint(\"Simplified VCF Output (first 20 lines):\")\nprint('\\n'.join(ancestry_vcf.split('\\n')[:20]))

## Merging Multiple Samples\n\nIn a real genetic genealogy project, you'd typically work with multiple samples. After processing individual profiles, they can be merged into a single VCF file for analysis. Let's simulate merging two samples.

In [ ]:
# Simulate merging VCF files for two samples\n\n# Create a simplified function to merge two DataFrames representing VCF data\ndef merge_vcf_data(df1, df2):\n    # Get common SNPs between the two datasets\n    common_snps = set(df1['rsid']).intersection(set(df2['rsid']))\n    \n    # Filter both DataFrames to include only common SNPs\n    df1_common = df1[df1['rsid'].isin(common_snps)].copy()\n    df2_common = df2[df2['rsid'].isin(common_snps)].copy()\n    \n    # Make sure both DataFrames have the same order of SNPs\n    df1_common = df1_common.set_index('rsid')\n    df2_common = df2_common.set_index('rsid')\n    \n    # Align them based on the index\n    df1_common, df2_common = df1_common.align(df2_common)\n    \n    # Reset the index\n    df1_common = df1_common.reset_index()\n    df2_common = df2_common.reset_index()\n    \n    # Create a new DataFrame for the merged data\n    merged_df = df1_common.copy()\n    merged_df = merged_df[['rsid', 'chromosome', 'position', 'allele1', 'allele2']]\n    \n    # Add genotype columns for both samples\n    merged_df['sample1_genotype'] = df1_common['allele1'] + df1_common['allele2']\n    merged_df['sample2_genotype'] = df2_common['allele1'] + df2_common['allele2']\n    \n    return merged_df\n\n# Merge our AncestryDNA and 23andMe samples (for demonstration)\nmerged_df = merge_vcf_data(ancestry_df, twentythree_df)\n\n# Display the merged data\nprint(\"Merged VCF Data (Ancestry + 23andMe):\")\ndisplay(merged_df.head())\n\n# Calculate some statistics on the merged data\nprint(f\"\\nTotal SNPs in merged data: {len(merged_df)}\")\n\n# Find concordant/discordant genotypes\nmerged_df['concordant'] = merged_df['sample1_genotype'] == merged_df['sample2_genotype']\nconcordant_count = merged_df['concordant'].sum()\nconcordance_rate = concordant_count / len(merged_df) if len(merged_df) > 0 else 0\n\nprint(f\"Concordant SNPs: {concordant_count} ({concordance_rate:.2%})\")\nprint(f\"Discordant SNPs: {len(merged_df) - concordant_count} ({1-concordance_rate:.2%})\")\n\n# Show example of discordant SNPs\nif (len(merged_df) - concordant_count) > 0:\n    print(\"\\nExample of discordant SNPs:\")\n    display(merged_df[~merged_df['concordant']].head())

## Connecting to the Full Lab Environment\n\nThis notebook provides a simplified introduction to processing raw DNA profiles in the browser. For working with real genetic data, you'll want to use the full notebook in a local environment.\n\n### How to Continue in the Local Environment\n\n1. Download the full Lab 2 notebook from the course materials\n2. Set up your local Python environment following the course instructions\n3. Install the required bioinformatics tools (e.g., lineage, bcftools)\n4. Run the notebook with your own raw DNA data\n\n### What's Different in the Full Version?\n\n- Processes actual raw DNA files from consumer testing companies\n- Uses more robust tools for genome build conversion\n- Performs complete VCF conversion with proper reference genome alignment\n- Includes file system operations to handle large datasets\n\nIn the next lab, we'll explore how to perform quality control on processed DNA data, building on what we've learned here.

In [ ]:
# Conclusion\n\n# Create a summary visualization of the processing workflow\nfrom matplotlib.patches import FancyArrowPatch\n\n# Create a figure for the workflow diagram\nplt.figure(figsize=(12, 6))\n\n# Define the stages of processing\nstages = [\n    \"Raw DNA Files\\n(23andMe, Ancestry, etc.)\",\n    \"Parse and\\nStandardize\",\n    \"Remap to\\nGRCh38\",\n    \"Determine\\nBiological Sex\",\n    \"Convert to\\nVCF Format\",\n    \"Merge\\nSamples\"\n]\n\n# Calculate positions for the stages\npositions = [(i, 0) for i in range(len(stages))]\n\n# Plot each stage as a node\nfor i, (pos, label) in enumerate(zip(positions, stages)):\n    plt.plot(pos[0], pos[1], 'ko', markersize=15, alpha=0.8)\n    plt.text(pos[0], pos[1]-0.15, label, ha='center', va='top', fontsize=10, wrap=True)\n\n# Add arrows between stages\nfor i in range(len(positions)-1):\n    arrow = FancyArrowPatch(\n        positions[i], positions[i+1],\n        arrowstyle='-|>',\n        color='blue',\n        mutation_scale=15,\n        linewidth=2\n    )\n    plt.gca().add_patch(arrow)\n\n# Add explanatory notes for key stages\nnotes = [\n    \"\",\n    \"Handles different\\ninput formats\",\n    \"Ensures consistent\\ngenomic coordinates\",\n    \"Uses X heterozygosity\\nand Y SNP counts\",\n    \"Standard format for\\ngenetic analysis\",\n    \"Combines multiple\\nsamples for analysis\"\n]\n\n# Add the notes below the diagram\nfor i, note in enumerate(notes):\n    if note:  # Skip empty notes\n        plt.text(i, -0.45, note, ha='center', va='top', fontsize=8, style='italic', wrap=True)\n\n# Set the plot limits and remove axes\nplt.xlim(-0.5, len(stages)-0.5)\nplt.ylim(-1, 0.5)\nplt.axis('off')\n\nplt.title('DNA Profile Processing Workflow', fontsize=14)\nplt.tight_layout()\nplt.show()\n\nprint(\"Lab 2 completed successfully!\")