Skip to content

Generating embeddings after model training #14

Closed Answered by gonzalobenegas
katiana22 asked this question in Q&A
Discussion options

You must be logged in to vote

Hello!

windows.parquet is produced by rules download_annotation, expand_annotation and define_embedding_windows. It needs two input files:

  • Gene annotation (gff/gtf): we obtained it from Ensembl Plants, but there are many other sources, which might vary slightly in format.
  • RepeatMasker file: can often be downloaded from the UCSC Genome Browser. We obtained from the Genome Browser from PlantRegMap, which also contains for many other plants. The official UCSC Genome Browser would contain for other species such as vertebrates.

I don't think you would encounter significant issues porting this to another species. Let me know!

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@katiana22
Comment options

@gonzalobenegas
Comment options

@katiana22
Comment options

Answer selected by gonzalobenegas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants