# Assembly Completeness Analysis for Aedes Genome Assembly

This Jupyter Notebook is dedicated to the analysis of assembly completeness of genomes of organisms. The main objective is to visualize and compare the assembly completeness in different datasets using a bar chart.

## Libraries Used
The notebook uses the following Python libraries for data analysis and visualization:
- Pandas for tabular data manipulation.
- Matplotlib and Seaborn for creating charts.
- NumPy for mathematical operations.
- Os for file and directory management.
- Json for loading JSON files.

## Initial Configurations
Before proceeding with the analysis, some initial configurations have been made, including setting the format for inline chart display and specifying the filename for saving the chart.

- Chart filename: filename
- Organism under examination: organism
- Destination folder path for charts: path

## Data Loading
The data required for analysis is contained in JSON files, and these data are loaded and analyzed in the notebook. The paths to JSON files were obtained by iterating through directories within the specified path.


## Bar Chart Creation
The bar chart is created to visualize the assembly completeness in different datasets. Three different colors have been used to represent three completeness categories: "Complete," "Fragmented," and "Missing."

The chart features include:
- Y-axis labels for dataset names.
- X-axis representing the percentage of completeness.
- Colored bars representing completeness in different categories.
- Legend identifying the categories.
- Chart title including the name of the organism under examination.

## Customization and Saving
The chart is customized for better readability and appearance. Borders of the chart and y-axis ticks have been removed for better presentation. Finally, the chart is saved as a PNG file in the specified folder.

This notebook provides a detailed overview of the assembly completeness analysis for organisms' genomes and allows for the generation of an informative chart for data visualization.

In [3]:
# load libraries
require(RIdeogram)

# From the Rstudio console set working directory "plot_markers/" folder:
# setwd("/path/to/support_protocol2/plot_markers/")


karyotype_file <- "./data/karyotype/Dmel_karyotype.txt"  # provide path to karyotype file 
coordinates <- "./data/full_table/Dmel_full_table.tsv" # provide path to markers' coordinates
out_name <- "Dmel" # sample name

# load karyotype
karyotype <- read.csv(karyotype_file, sep="\t", header = TRUE, stringsAsFactors = F)

# load mapping busco_coordinates.txt
busco_mappings <- read.csv(coordinates, sep="\t", header = FALSE, stringsAsFactors = F)

colnames(busco_mappings)[1] <- "Status"
colnames(busco_mappings)[2] <- "Chr"
colnames(busco_mappings)[3] <- "Start"
colnames(busco_mappings)[4] <- "End"

busco_mappings$Type <- "BUSCO_marker"
busco_mappings$Shape <- "circle" 

# change status in color
busco_mappings$Status <- gsub("Complete", '2dacd6', busco_mappings$Status)
busco_mappings$Status <- gsub("Duplicated", '0a0e1a', busco_mappings$Status)
busco_mappings$Status <- gsub("Fragmented", 'eded13', busco_mappings$Status)
colnames(busco_mappings)[1] <- "color"

busco_mappings <- busco_mappings[, c(5,6,2,3,4,1)]

ideogram(karyotype = karyotype, label = busco_mappings, label_type = "marker", output = paste0(out_name, ".svg"))
# convert to png
convertSVG(paste0(out_name, ".svg"), device = "png")



ERROR: Error in `$<-.data.frame`(`*tmp*`, "y", value = NA): replacement has 1 row, data has 0
