#

**Contigs** are crucial in accurately reconstructing the DNA sequence in a specific region. **Scaffolds** extend this by aligning contigs in a probable order, bridging the gaps with estimated sequences or placeholders.

**The** **contig N50** is a statistical measure that represents the length of the longest contig (contiguous assembled sequence) for which the sum of the lengths of all contigs of equal or greater length is at least 50% of the total genome length. In other words, it is the length of the contig at which 50% of the genome assembly is contained in contigs of that length or longer. A higher contig N50 value indicates that the assembly contains fewer and longer contigs, which is generally desirable as it suggests a more contiguous and complete assembly. A lower contig N50 value indicates that the assembly is more fragmented, with shorter contigs.

**The** **scaffold N50** is similar to the contig N50, but it considers scaffolds instead of contigs. Scaffolds are ordered and oriented contigs with gaps (represented by N’s) in between them. Likewise, the scaffold N50 is the length of the longest scaffold for which the sum of the lengths of all scaffolds of equal or greater length is at least 50% of the total genome length. A higher scaffold N50 value suggests that the assembly is more contiguous, with fewer and longer scaffolds. This indicates that more contigs have been correctly ordered and oriented into scaffolds, bridging the gaps between them.

**Both contig N50 and scaffold N50** are widely used to evaluate the quality and completeness of genome assemblies, especially when comparing different assembly strategies or tools.

In [1]:
#| eval: false
import pandas as pd
import seaborn as sns
import numpy as np

# Create a sample multi-indexed DataFrame
index = pd.MultiIndex.from_product(
    [['A', 'B', 'C'], ['X', 'Y', 'Z']])
df = pd.DataFrame(
    np.random.randn(9, 3), index=index, columns=['Col1', 'Col2', 'Col3'])

# Create a seaborn color palette
palette = sns.color_palette("husl", n_colors=len(df.index.levels[1]))

palette

# Function to apply colors to index levels
def color_index(row):
    color = palette[row.name[0]]  
    # Get color based on first level of index
    return [f'background-color: rgba({int(color[0]*255)}, {int(color[1]*255)}, {int(color[2]*255)}, 0.5)'] * len(row)

# Apply the styling
styled_df = df.style.apply(color_index, axis=1)



# Display the styled DataFrame
styled_df

stats_df.Contigs[]

In [2]:
#| label: best output so far
#| eval: false
# This gives a color scheme that visually distinguishes the first level of the index while grouping rows by their second-level index across all columns. The column headers will also have colors from the same palette as the second-level index. You can adjust the color palettes and alpha values to fine-tune the appearance as needed.

import pandas as pd
import seaborn as sns
import numpy as np

# Create color palettes for first level index and second level index + columns
n_colors1 = len(df.index.levels[0])
n_colors2 = len(df.index.levels[1]) + len(df.columns)
palette1 = sns.color_palette("husl", n_colors=n_colors1)
palette2 = sns.color_palette("Set2", n_colors=n_colors2)

# Map index levels and column names to colors with dictionaries
color_dict1 = dict(zip(df.index.levels[0], palette1))
color_dict2 = dict(zip(list(df.index.levels[1]) + list(df.columns), palette2))

def color_to_rgba(color, alpha = 0.5):
    return f'background-color: rgba({", ".join(f"{int(c*255)}" for c in color)}, {alpha})'

def color_index(index):
    # For full multi-index, use color from first level
    if isinstance(index, tuple):
        # For full multi-index, use color from first level
        return color_to_rgba(color_dict1[index[0]])
    # For single-level indices, use appropriate color dictionaries
    elif index in color_dict2:
        return color_to_rgba(color_dict2[index])
    else:
        return ''

def color_values(s):
    # List comprehension 
    return [color_to_rgba(color_dict2[idx[1]]) for idx in s.index]

# Apply the styling
styled_df = df.style.map_index(color_index, axis=0)
styled_df = styled_df.apply(color_values, axis=0)

# Display the styled DataFrame
styled_df