# How to Use the GenBank Type Provider
This notebook outlines how the GenBank Type Provider can be used to access the genomic data stored in the GenBank FTP server. The following topics are covered:

1. How to create a typed representation of a GenBank Assembly
2. How to use wildcards to create a typed representation of a GenBank Assembly
3. How to access the Genome of a GenBank Assembly

## Creating a Typed Representation of a GenBank Assembly
If the taxon, species, and assembly names are all known, the GenBankProvider can directly create the typed representation of the GenBank Assembly. 

The below example shows how the genomic sequence in an Assembly can be accessed.

In [None]:
// Load the BioProviders Project
#r "nuget: BioProviders, 1.0.0"

open BioProviders


// Generate the Assembly Type
let [<Literal>] Taxon = "bacteria"
let [<Literal>] Species = "Staphylococcus_lugdunensis"
let [<Literal>] Assembly = "GCA_000185485.1_ASM18548v1"

type AssemblyType = GenBankProvider<Taxon, Species, Assembly>


// Use the Assembly Type
let gbff = AssemblyType.Genome()

gbff.Sequence.GetSubSequence 0 20
|> fun x -> x.ToString()
|> printfn "The first 20 bases for the sequence are: %s"


The first 20 bases for the sequence are: GTTATATTGGAGAATTAGCT


### Code Breakdown

**1. Loading the BioProviders Project**

Loading the BioProviders Project into your code makes the GenBank Type Provider accessible to your programs.

In [None]:
#r "nuget: BioProviders, 1.0.0"

open BioProviders


**2. Generating the Assembly Type**

To directly create a GenBank Assembly, the following information must be known:

* Taxonomic group
* Species 
* Assembly name

Once this information is known, the GenBank Assembly Type can be generated. If these parameters are defined outside of the Type Provider, they must be marked as [`<Literal>`].


In [None]:
// The taxonomic group of the species
let [<Literal>] Taxon = "bacteria"

// The species name in the form Genus_species
let [<Literal>] Species = "Staphylococcus_lugdunensis"

// The assembly name
let [<Literal>] Assembly = "GCA_000185485.1_ASM18548v1"

// Create the Assembly Type
type AssemblyType = GenBankProvider<Taxon, Species, Assembly>


**3. Using the Assembly Type**

After following these steps, the GenBank Assembly Type has been created and can now be used in your programs. For example, you can now access the genomic sequence of the assembly.

In [None]:
// Load the GenBank Flat File
let gbff = AssemblyType.Genome()

// Print the start of the sequence
gbff.Sequence.GetSubSequence 0 20
|> fun x -> x.ToString()
|> printfn "The first 20 bases for the sequence are: %s"


The first 20 bases for the sequence are: GTTATATTGGAGAATTAGCT


## Using Wildcards to Create a Typed Representation of a GenBank Assembly
Wildcard operators can be used to access collections of taxa, species, or assemblies based a provided pattern. This is useful if the full taxon, species, or assembly names are not known.

### Assembly Wildcards

**1. All Assemblies**

To access all assemblies for a species, the Assembly parameter can be excluded from the Type Provider. This will generate a type containing all the assemblies for the species. Then, a single assembly can be selected from the provided list.

In [None]:
// Load the BioProviders Project
#r "nuget: BioProviders, 1.0.0"

open BioProviders

// Generate the Assembly Type
let [<Literal>] Taxon = "bacteria"
let [<Literal>] Species = "Staphylococcus_lugdunensis"

type Assemblies = GenBankProvider<Taxon, Species>
type AssemblyType = Assemblies.``GCA_000185485.1_ASM18548v1``

// Load the GenBank Flat File for the Assembly
let gbff = AssemblyType.Genome()


**2. Subset of Assemblies**

To access a subset of assemblies for a species, the \* operator can be used at the end of the Assembly parameter to match all assemblies beginning with the specified pattern. This will generate a type containing all the assemblies matching the pattern. Then, a single assembly can be selected.

In [None]:
// Load the BioProviders Project
#r "nuget: BioProviders, 1.0.0"

open BioProviders

// Generate the Assembly Type
let [<Literal>] Taxon = "bacteria"
let [<Literal>] Species = "Staphylococcus_lugdunensis"
let [<Literal>] Assembly = "GCA_000*"

type Assemblies = GenBankProvider<Taxon, Species, Assembly>
type AssemblyType = Assemblies.``GCA_000185485.1_ASM18548v1``

// Load the GenBank Flat File for the Assembly
let gbff = AssemblyType.Genome()


### Species Wildcards
Species wildcards have not yet been implemented.

### Taxon Wildcards
Taxon wildcards have not yet been implemented.

### Mixed Wildcards
It is not yet possible to mix wildcards.

## Accessing the Genome of an Assembly


In [None]:
// Load the BioProviders Project
#r "nuget: BioProviders, 1.0.0"

open BioProviders

// Generate the Assembly Type
let [<Literal>] Taxon = "bacteria"
let [<Literal>] Species = "Staphylococcus_lugdunensis"

type Assemblies = GenBankProvider<Taxon, Species>
type AssemblyType = Assemblies.``GCA_000185485.1_ASM18548v1``

// Load the GenBank Flat File for the Assembly
let gbff = AssemblyType.Genome()


### Sequence

**1. Get the Length of the Sequence**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get the sequence length using inbuilt Count property
let sequence_count = sequence.Count

// Get the sequence length using Seq.length
let sequence_length = sequence |> Seq.length

// Display the string
printfn "The sequence length using the Count property is: %d" sequence_count
printfn "The sequence length using Seq.length is: %d" sequence_length


The sequence length using the Count property is: 1265228
The sequence length using Seq.length is: 1265228


**2. Converting Sequence to String**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Convert the sequence to a string
let sequence_string = sequence.ToString()

// Display the string
printfn "The sequence as a string is: %s" sequence_string


The sequence as a string is: GTTATATTGGAGAATTAGCTCAGCTGGGAGAGCATCTGCCTTACAAGCAGAGGGTCGGCGGTTC... +[1265164]


**3. Get Sequence Complement**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get the complement and convert to a string
let complemented_sequence = sequence.GetComplementedSequence()
let sequence_string = complemented_sequence.ToString()

// Display the string
printfn "The complemented sequence is: %s" sequence_string


The complemented sequence is: CAATATAACCTCTTAATCGAGTCGACCCTCTCGTAGACGGAATGTTCGTCTCCCAGCCGCCAAG... +[1265164]


**4. Get Reverse Sequence**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get the reverse and convert to a string
let reversed_sequence = sequence.GetReversedSequence()
let sequence_string = reversed_sequence.ToString()

// Display the string
printfn "The reversed sequence is: %s" sequence_string


The reversed sequence is: CGACTGCATAAGACTGGATATCTTTTATGTTTAATTTACGCTATGAAGTTTTTACTTTTTCAAT... +[1265164]


**5. Get Reverse-Complement Sequence**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get the reverse-complement and convert to a string
let reversed_sequence = sequence.GetReverseComplementedSequence()
let sequence_string = reversed_sequence.ToString()

// Display the string
printfn "The reverse-complement sequence is: %s" sequence_string


The reverse-complement sequence is: GCTGACGTATTCTGACCTATAGAAAATACAAATTAAATGCGATACTTCAAAAATGAAAAAGTTA... +[1265164]


**6. Get Sub-Sequence**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get the sub-sequence using int64 notation for indices
let first = 30L
let last = 50L
let subsequence = sequence.GetSubSequence first last
let sequence_string = subsequence.ToString()

// Display the string
printfn "The sub-sequence starting at %d and ending at %d is: %s" first last sequence_string


The sub-sequence starting at 30 and ending at 50 is: AGCATCTGCCTTACAAGCAGAGGGTCGGCGGTTCGAACCCGTCATTCTCC


**7. Get Sequence Item**

In [None]:
// Extract the sequence from the GenBank Flat File
let sequence = gbff.Sequence

// Get a sequence item stored as a byte
let pos = 10
let sequence_item = sequence.Item pos

// Display the string
printfn "The sequence item at position %d is the byte %O, which corresponds to the character %c" first sequence_item (char(sequence_item))


The sequence item at position 30 is the byte 65, which corresponds to the character A


### Metadata
The metadata for a genome must still be implemented.