<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# Basic data structures: arrays and hashes

Perl course – Feb. 2017

# 1. Arrays

Ordered list of scalar data (elements) accessed through a numeric script (index)

Declaring values of an array

- Mixing numbers and strings is possible

```perl
@grades=("John", "98", "Christian", "77", "Jule", "88");
```

- Creating an empty array

```perl
my @array = ();
```

- Using an existing array to create a new array

```perl
@students = ("Alex", "Paul", "Burak");
@updated_students = ("Marie", "Ana", @students);
```

- Accessing array elements

## Push - add one or more values to the end of an array 

- Add element in last position

```perl
push @updated_students, 'Jane';
print "@names\n";     #Marie Ana Alex Paul Burak Jane
```

- Add an array to an array

```perl
my @others = ('Darth', 'Vader');
push @updated_students, @others;

print "@updated_students\n"; 
#Marie Ana Alex Paul Burak Jane Darth Vader
```
## Pop – remove the last element of an array
- Remove and return the last element of an array     

```perl
my $last = pop @updated_students; 

print "$last\n";  # Vader
print "@updated_students\n";    
# Marie Ana Alex Paul Burak Jane Darth
```

## Shift – removes first element of the array
```perl
my $first = shift @updated_students;

print "$first\n";     # Marie
print "@updated_students\n";     
#Ana Alex Paul Burak Jane Darth
```
## Unshift - add elements at the beginning of the array

- Add one element
```perl
unshift @updated_students, 'Marie';
print "@updated_students\n";    
#Marie Ana Alex Paul Burak Jane Darth 
```

- Add a second array
```perl
my @others = ('Jens', 'Nina');
unshift @updated_students, @others;
print "@updated_students\n";     
#Jens Nina Marie Ana Alex Paul Burak Jane Darth
```
## Split - Cut a string into pieces and create an array

- Cuts a string into pieces and creates an array

```perl
my $str = "ab cd ef gh ij";

my @words = split / /, $str;

```
* What would be the value of ```$words[1]?```

- Useful when reading files:

```perl
while(<FILE>) 
## reading file 
{
	push( @matrix , split(/\s+/,$_);
 }
```

## Join – counterpart of split
- Join elements of an array by a character




In [25]:
%%perl

my @names = ("Alex", "Paul", "Burak");
my $str = join ':', @names;

print $str. "\n";                       # Alex:Paul:Burak

#- Join strings by a character


my $data = join "-", $str, "names";

print $data. "\n";
# Alex:Paul:Burak-names


#- Join array elements and a string

$str2 = join '', @names, 'Baz';
print $str2. "\n"; 



Alex:Paul:Burak
Alex:Paul:Burak-names
AlexPaulBurakBaz


# 2.Multidimensional arrays (e.g. matrix)
- Explicit declaration

- Assign elements of a matrix with loops

In [26]:
%%perl

my @mat;
for($i=0; $i<5; $i++){
print "\n";
    for($j=0; $j<10; $j++){
        $mat[$i][$j] = $i * $j;
        print $mat[$i][$j]. " ";
    }
}



0 0 0 0 0 0 0 0 0 0 
0 1 2 3 4 5 6 7 8 9 
0 2 4 6 8 10 12 14 16 18 
0 3 6 9 12 15 18 21 24 27 
0 4 8 12 16 20 24 28 32 36 

In [27]:
%%perl
#- Assign entire rows of a matrix using push and an array reference constructor []

@dimension = (1,1,1,1,1,1,1,1,1,1);
push(@mat, [@dimension]);


#- To print we need to loop through the array

for my $array (@mat){
    print "@$array \n";
}


#- Add columns to an existing row

push @{ $mat[0] }, 40, 70;


1 1 1 1 1 1 1 1 1 1 


# 3. Hashes
A hash is an unordered group of key-value pairs and is declared with %

- Creating hashes on by one
```perl
$colors_of {'apple'} = green; 
```

- When accessing a specific key-value pair, we use \$ because we are accessing a single \scalar. The key is placed in curly braces.

* Creation of hashes

## Creating a hash with a list




In [18]:
%%perl

my %colors_of = (
        "apple"  => "green",
        "orange" => "orange",
        "grape"  => "purple",);
        
## Hashes –values and keys
# Creates an array with the values of the hash


my @colors = values(%colors_of); 
print "Element one of colors array is $colors[0]\n";  #green
print "******************************************\n";


# Checking for the existence of keys

my $check_fruit = "apple";
if( exists($colors_of{$check_fruit} ) ){
   print "The $check_fruit is $colors_of{$check_fruit}\n";
}
else{
   print "I don't know the color of the $check_fruit\n";
}

print "******************************************\n";

## Hashes - keys
# The keys of the hash are stored in an array

@fruits = keys %colors_of;
print "$fruits[0]\n";  #apple

print "******************************************\n";

# The keys can be sorted and printed


foreach $fruit (sort (@fruits))   {
    print "$fruit $color_of{ $fruit } \n";
}
#apple green
#grape purple
#orange orange



Element one of colors array is purple
******************************************
The apple is green
******************************************
grape
******************************************
apple  
grape  
orange  


# 4. Combining arrays and hashes

- An array with the codons of each a.a. 


In [3]:
%%perl
my %codons = (
 'F' => ['TTT','TTC'], 
 'L' => ['TTA','TTG','CTT','CTC','CTA','CTG'], 
 'S' => ['TCT','TCC','TCA','TCG','AGT','AGC'], 
 'Y' => ['TAT','TAC'], 
); 
# $codons{'L'} is a reference to an array

# To print the values

foreach my $amino (keys(%codons)) {#goes key by key
   print "$amino "; 
     foreach my $codon ( @{ $codons{$amino} } ) {      #returns the codons associated to a key 
         print "$codon,";
    } 
         print "\n"; 
} 

S TCT,TCC,TCA,TCG,AGT,AGC,
F TTT,TTC,
L TTA,TTG,CTT,CTC,CTA,CTG,
Y TAT,TAC,


# Exercises with arrays and hashes

Now is your turn to apply what you just learn!

We created previously a perl script to read a genbank file and extract the locus tag, the protein sequence, and the nucleotide sequence of the scaffold. Now it's time to make things more interesting!

Let's fecth our genbank file and produce a fasta file with the protein sequences. The identifiers of the sequences should have the protein annotation

## Tips:

- Make a hash with the locus tags and their respective sequences.

Questions: 
 - What would be your key?
 - What would be your value?

- Make a second hash with the locus tags and their annotations
 
Question:
- Can you think of a different data structure to solve this problem?






