# Class 8: Filtering data, slicing and dicing

&nbsp;  
Filtering and slicing data is an important step in data science, as it allows you to create smaller subsets of data which you can then use for more specialised analyses. In today's class, we'll be exploring different ways we can do this.

Before we begin, let's remind ourselves how to index with Python.

<div class="alert alert-block alert-info">
<b>Remember:</b> Python indexing starts at 0. This means that the first element in an object is located at position 0.   
</div>

&nbsp;  
<div>
<img src="attachment:Indexing%20image_cropped.png" width='80%' title=""/>
</div>
&nbsp; 

If `a = [14, 6, 29, 3.5, 11]`...

- what is `a[0]`?
- what is `a[3]`?
- why does `a[8]` return an error?
- why does `a[len(a)]` return an error?

> - a[0] = 14
> - a[3] = 3.5
> - there is no element in position [8]
> - len (a) is the number of elements in a (which is 5) and there is no element in position 5.

## Load the modules

First, let's import pandas.

In [1]:
import pandas as pd

## Selecting columns based on column names, or rows based on index value

&nbsp; 
<div>
<img src="attachment:field-6374781_640.jpg" width='50%' title=""/>
</div>
&nbsp; 

We can make use of column names and row indexes to easily select columns and rows from a dataframe.

First, read in the count.csv file from the Datasets folder. Call it 'count'.

In [2]:
count = pd.read_csv('../Datasets/count.csv')

Now let's print it out so we can see what it looks like.

In [3]:
count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
0,Waun_fach,45,44,103,521
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128
7,Heol-y-bryn,1,11,543,223
8,Lan-y-mor,2,34,723,316
9,Pen-y-garn,67,3,126,402


### Columns

To select various columns from our dataframe, we use the syntax `dataframe[['column name 1', 'column name 2']]`. Here are some examples...

In [4]:
barley = count['Barley']
barley

0    103
1    233
2    432
3    612
4    332
5     12
6      4
7    543
8    723
9    126
Name: Barley, dtype: int64

In [5]:
barley_oats = count[['Barley', 'Oats']]
barley_oats

Unnamed: 0,Barley,Oats
0,103,521
1,233,324
2,432,734
3,612,123
4,332,243
5,12,734
6,4,128
7,543,223
8,723,316
9,126,402


What happens if we try and select a column that doesn't exist. You will notice there is a small typo in the code below. What happens when you try and run it?

In [6]:
barley = count['barley']
barley

KeyError: 'barley'

At the bottom of the error message, you will notice the phrase `KeyError: 'barley'` which tells us that this is where the problem is.

Select the Sheep and Goats columns.

In [7]:
sheep_goats = count[['Sheep', 'Goats']]
sheep_goats

Unnamed: 0,Sheep,Goats
0,45,44
1,5,2
2,67,23
3,8,6
4,23,7
5,4,9
6,55,3
7,1,11
8,2,34
9,67,3


Now select the Sheep and Goats columns again, but flip the order. What happens to the output?

In [8]:
sheep_goats = count[['Goats', 'Sheep']]
sheep_goats

Unnamed: 0,Goats,Sheep
0,44,45
1,2,5
2,23,67
3,6,8
4,7,23
5,9,4
6,3,55
7,11,1
8,34,2
9,3,67


### Rows

To select rows based on index value, we use the syntax `dataframe[start:stop]`. When we select rows in this way, the 'stop' index is **not** included. Therefore, the 'stop' index is 1 more than we actually need. For example, to select rows 1 to 4, we would need to use the code...

In [9]:
subset_count = count[1:5]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243


And to select the first 7 rows i.e. rows 0-6, we would use the code...

In [10]:
subset_count = count[:7] # note: we do not need to include the zero here i.e. count[0:7]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
0,Waun_fach,45,44,103,521
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128


Select rows 2-8 inclusive from the count dataset.

In [11]:
subset_count = count[2:9]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128
7,Heol-y-bryn,1,11,543,223
8,Lan-y-mor,2,34,723,316


Now select rows 3-5 inclusive from the count dataset.

In [12]:
subset_count = count[3:6]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734


We can select the last row using `subset_count = count[-1:]`. Write code in the cell below to select the last 3 rows.

In [13]:
subset_count = count[-3:]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
7,Heol-y-bryn,1,11,543,223
8,Lan-y-mor,2,34,723,316
9,Pen-y-garn,67,3,126,402


Now select all the rows except the last one.

In [14]:
subset_count = count[:-1]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
0,Waun_fach,45,44,103,521
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128
7,Heol-y-bryn,1,11,543,223
8,Lan-y-mor,2,34,723,316


## Slicing

When we slice a dataframe, we extract a section of that dataframe. The slicing can be done across columns, or across rows, or both. Pandas has two main methods for doing this:

- `.iloc[]` is indexed-based which means we have to pass the **integer location** of the row or column we want to slice.
- `.loc[]` is label-based which means that we have to pass the **label** of the row or column we want to slice.

We'll practice each of these in turn.

### .iloc

`.iloc[]` uses the syntax `df.iloc[row index range, column index range]`.  For example:

- `df.iloc[2:4]` returns rows 2 and 3
- `df.iloc[:5]` returns rows 0, 1, 2, 3 and 4
- `df.iloc[:, 1]` returns column 1
- `df.iloc[1:3, 5]` returns rows 1 and 2 in column 5
- `df.iloc[:2, :3]` returns the values in rows 0 and 1, columns 0, 1 and 2
- `df.iloc[2,5]` returns the value in row 2, column 5

Let's try some of these with the count dataset. Look at the code in the cells below and think about what you expect the output to be. Then run the cells to see if you were right.

In [15]:
count.iloc[1:3]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734


In [16]:
count.iloc[:7]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
0,Waun_fach,45,44,103,521
1,Ffos_fawr,5,2,233,324
2,Aberheidol,67,23,432,734
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128


In [17]:
count.iloc[:, 2:3]

Unnamed: 0,Goats
0,44
1,2
2,23
3,6
4,7
5,9
6,3
7,11
8,34
9,3


In [18]:
count.iloc[:, 1:5]

Unnamed: 0,Sheep,Goats,Barley,Oats
0,45,44,103,521
1,5,2,233,324
2,67,23,432,734
3,8,6,612,123
4,23,7,332,243
5,4,9,12,734
6,55,3,4,128
7,1,11,543,223
8,2,34,723,316
9,67,3,126,402


In [19]:
count.iloc[5:9, 4]

5    734
6    128
7    223
8    316
Name: Oats, dtype: int64

In [20]:
count.iloc[2:4, :3]

Unnamed: 0,Field,Sheep,Goats
2,Aberheidol,67,23
3,Hen_cae,8,6


In [21]:
count.iloc[3:8, -1]

3    123
4    243
5    734
6    128
7    223
Name: Oats, dtype: int64

In [22]:
count.iloc[8,1]

2

In [23]:
count.iloc[[0,8],1:]

Unnamed: 0,Sheep,Goats,Barley,Oats
0,45,44,103,521
8,2,34,723,316


In [24]:
count.iloc[4:-2, 2:5]

Unnamed: 0,Goats,Barley,Oats
4,7,332,243
5,9,12,734
6,3,4,128
7,11,543,223


### .loc

We use `.loc[]` when we want to refer to columns (or rows) by their names, rather than their index. `.loc[]` uses the same syntax as `.iloc[]` but there is one important difference - both the `start` **and** the `stop` are **inclusive**. 

Let's run through some examples...

In [25]:
count.loc[2:4,'Sheep']

2    67
3     8
4    23
Name: Sheep, dtype: int64

In [26]:
count.loc[:, 'Sheep':'Barley']

Unnamed: 0,Sheep,Goats,Barley
0,45,44,103
1,5,2,233
2,67,23,432
3,8,6,612
4,23,7,332
5,4,9,12
6,55,3,4
7,1,11,543
8,2,34,723
9,67,3,126


In [27]:
count.loc[1:9, ['Field', 'Oats']]

Unnamed: 0,Field,Oats
1,Ffos_fawr,324
2,Aberheidol,734
3,Hen_cae,123
4,Glynan,243
5,Derwen,734
6,Llanwenant,128
7,Heol-y-bryn,223
8,Lan-y-mor,316
9,Pen-y-garn,402


In [28]:
count.loc[2, 'Field']

'Aberheidol'

Let's practice this with another dataset. Load the surveys.csv file from the Datasets folder. Call it 'surveys'.

In [29]:
surveys = pd.read_csv('../Datasets/surveys.csv')

Select the values in the first 5 rows (i.e. rows 0-4) and in the 'plot_id' column.

In [30]:
surveys.loc[0:4, 'plot_id']

0    2
1    3
2    2
3    7
4    3
Name: plot_id, dtype: int64

Select the values in rows 3356 - 3366 (inclusive) and in the 'weight' column.

In [31]:
surveys.loc[3356:3366, 'weight']

3356    150.0
3357    130.0
3358    142.0
3359     26.0
3360     49.0
3361      8.0
3362     18.0
3363      NaN
3364      NaN
3365      NaN
3366     41.0
Name: weight, dtype: float64

Select the values in rows 490 and 495, and in the 'sex' and 'hindfoot_length' columns.

In [32]:
surveys.loc[[490, 495], ['sex', 'hindfoot_length']]

Unnamed: 0,sex,hindfoot_length
490,M,48.0
495,F,36.0


Select the value in row 12793 and in the 'species_id' column.

In [33]:
surveys.loc[12793, 'species_id']

'AB'

## Subsetting using criteria

&nbsp; 
<div>
<img src="attachment:sheeps-3437467_640.jpg" width='50%' title=""/>
</div>
&nbsp; 

We can also use pandas to select rows based on particular column values. For example, we might want to know which fields contain more than 10 sheep, or fewer than 20 goats. Here are some examples...

In [34]:
count[count['Sheep'] > 10]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
0,Waun_fach,45,44,103,521
2,Aberheidol,67,23,432,734
4,Glynan,23,7,332,243
6,Llanwenant,55,3,4,128
9,Pen-y-garn,67,3,126,402


In [35]:
count[count['Goats'] < 20]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
1,Ffos_fawr,5,2,233,324
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
5,Derwen,4,9,12,734
6,Llanwenant,55,3,4,128
7,Heol-y-bryn,1,11,543,223
9,Pen-y-garn,67,3,126,402


In [36]:
count[(count['Sheep'] > 10) & (count['Goats'] < 20)]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
4,Glynan,23,7,332,243
6,Llanwenant,55,3,4,128
9,Pen-y-garn,67,3,126,402


In [37]:
count[count['Barley'] > count['Oats']]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats
3,Hen_cae,8,6,612,123
4,Glynan,23,7,332,243
7,Heol-y-bryn,1,11,543,223
8,Lan-y-mor,2,34,723,316


<div class="alert alert-block alert-info">
<b>Note:</b> Some of the operators you might find useful are:
<ul>
<li>equals: ==</li>  
<li>does not equal: !=</li>
<li>is greater than: ></li>
<li>is less than: <</li>
<li>is greater than or equal to: >=</li>
<li>is less than or equal to: <=</li>
</ul>
</div>

Let's add 2 new columns to the count dataset.

In [38]:
count["Soil"] = ["Sand","Loam","Loam","Clay","Clay","Loam","Sand","Sand","Clay","Clay"]
count["Drainage"] = ["Good", "OK", "Poor", "Poor", "Poor", "Good", "Good", "OK", "OK","Poor"]

In the cell below, write code to extract the rows from the count dataset that contain data from clay soils, and that contain barley values less than or equal to 500. How many rows did you end up with? Compare with your neighbours.

In [39]:
subset_count = count[(count['Soil'] == 'Clay') & (count['Barley'] <= 500)]
subset_count

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
4,Glynan,23,7,332,243,Clay,Poor
9,Pen-y-garn,67,3,126,402,Clay,Poor


Another useful way to filter dataframes is to extract rows that contain values within a specified list. To do this, we use the `isin` command. For example, we could select the rows from the count dataframe that contain the soils 'Clay' or 'Loam' using `count[count['Soil'].isin(['Clay', 'Loam'])]`. Try it below.

In [40]:
count[count['Soil'].isin(['Clay', 'Loam'])]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
1,Ffos_fawr,5,2,233,324,Loam,OK
2,Aberheidol,67,23,432,734,Loam,Poor
3,Hen_cae,8,6,612,123,Clay,Poor
4,Glynan,23,7,332,243,Clay,Poor
5,Derwen,4,9,12,734,Loam,Good
8,Lan-y-mor,2,34,723,316,Clay,OK
9,Pen-y-garn,67,3,126,402,Clay,Poor


Using the `.isin` command, select all rows from the count dataset that have good or OK drainage.

In [41]:
count[count['Drainage'].isin(['Good', 'OK'])]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
0,Waun_fach,45,44,103,521,Sand,Good
1,Ffos_fawr,5,2,233,324,Loam,OK
5,Derwen,4,9,12,734,Loam,Good
6,Llanwenant,55,3,4,128,Sand,Good
7,Heol-y-bryn,1,11,543,223,Sand,OK
8,Lan-y-mor,2,34,723,316,Clay,OK


Now select all rows that have sand or clay soils, and have fewer than 400 oats.

In [42]:
count[(count['Soil'].isin(['Sand', 'Clay'])) & (count['Oats'] < 400)]

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
3,Hen_cae,8,6,612,123,Clay,Poor
4,Glynan,23,7,332,243,Clay,Poor
6,Llanwenant,55,3,4,128,Sand,Good
7,Heol-y-bryn,1,11,543,223,Sand,OK
8,Lan-y-mor,2,34,723,316,Clay,OK


Another way of subsetting a dataframe based on criteria is to use the `query` function. Here are some examples...

In [43]:
count.query('Field == "Lan-y-mor"')

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
8,Lan-y-mor,2,34,723,316,Clay,OK


In [44]:
count.query('Sheep > 10 & Barley < 500')

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
0,Waun_fach,45,44,103,521,Sand,Good
2,Aberheidol,67,23,432,734,Loam,Poor
4,Glynan,23,7,332,243,Clay,Poor
6,Llanwenant,55,3,4,128,Sand,Good
9,Pen-y-garn,67,3,126,402,Clay,Poor


Use the `query` function to extract the rows that have more than 10 goats and sandy soils.

In [45]:
count.query('Goats > 10 & Soil == "Sand"')

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
0,Waun_fach,45,44,103,521,Sand,Good
7,Heol-y-bryn,1,11,543,223,Sand,OK


Use the `query` function to extract the rows that have more than 100 barley, less than 500 oats and OK drainage.

In [46]:
count.query('Barley > 100 & Oats < 500 & Drainage == "OK"')

Unnamed: 0,Field,Sheep,Goats,Barley,Oats,Soil,Drainage
1,Ffos_fawr,5,2,233,324,Loam,OK
7,Heol-y-bryn,1,11,543,223,Sand,OK
8,Lan-y-mor,2,34,723,316,Clay,OK


## Subsetting using `groupby`

`groupby` is another really useful function in Python. It allows us to split a dataframe into groups and then perform a function on each of those groups. For example...

- `count.groupby('Soil').mean(numeric_only = True)`
- `count.groupby('Drainage').max(numeric_only = True)`

In these examples, the name in brackets (e.g. 'Soil') tells Python how to form the groups. In the first example we are grouping by soil type and in the second example we are grouping by Drainage type. Then we add the function we want to apply to each group (mean, max, min, sum etc). The `numeric_only = True` command is used to suppress a warning message (although the function still works without it).

Copy these two examples into the cells below and examine the output. Make sure you understand what the function is doing.

In [47]:
count.groupby('Soil').mean(numeric_only = True)

Unnamed: 0_level_0,Sheep,Goats,Barley,Oats
Soil,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Clay,25.0,12.5,448.25,271.0
Loam,25.333333,11.333333,225.666667,597.333333
Sand,33.666667,19.333333,216.666667,290.666667


In [48]:
count.groupby('Drainage').max(numeric_only = True)

Unnamed: 0_level_0,Sheep,Goats,Barley,Oats
Drainage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Good,55,44,103,734
OK,5,34,723,324
Poor,67,23,612,734


## If you have time in class, or for homework...

Read in the Greenland_climate.csv dataset from the Datasets folder. Then...

- select the 'Av rainfall (mm)' and 'max rainfall (mm)' columns
- select rows 1-5 inclusive
- select the values in rows 2-4 and columns 1-6 inclusive
- select the values in rows 2 and 5, and the 'max sea temp ( °C)' column
- select the rows with an average rainfall of greater than or equal to 200mm
- select the rows with an average temperature of greater than -4 °C, and average sunshine of greater than 2.5 hours
- use `isin` to select the Nuuk, Qaanaq and Sisimiut rows

In [49]:
climate = pd.read_csv('../Datasets/Greenland_climate.csv')
climate

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
0,Tasiilaq,1.95,880,3.0,-10.4,80,5,5.1
1,Ittoqqortoormiit,-4.3,230,1.0,-18.4,30,2,2.5
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
3,Ilulissat,-3.85,270,1.0,-18.8,40,4,4.4
4,Qaanaq,-4.1,120,0.0,-29.1,20,1,2.1
5,Maniitsoq,-3.9,180,1.0,-23.2,55,3,2.8
6,Sisimiut,-3.3,130,1.0,-25.6,50,2,2.6


In [50]:
subset_climate = climate[['Av rainfall (mm)', 'max rainfall (mm)']]
subset_climate

Unnamed: 0,Av rainfall (mm),max rainfall (mm)
0,880,80
1,230,30
2,875,105
3,270,40
4,120,20
5,180,55
6,130,50


In [51]:
subset_climate = climate[1:6]
subset_climate

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
1,Ittoqqortoormiit,-4.3,230,1.0,-18.4,30,2,2.5
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
3,Ilulissat,-3.85,270,1.0,-18.8,40,4,4.4
4,Qaanaq,-4.1,120,0.0,-29.1,20,1,2.1
5,Maniitsoq,-3.9,180,1.0,-23.2,55,3,2.8


In [52]:
climate.iloc[2:5, 1:7]

Unnamed: 0,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C)
2,-3.2,875,1.8,-10.9,105,4
3,-3.85,270,1.0,-18.8,40,4
4,-4.1,120,0.0,-29.1,20,1


In [53]:
climate.loc[[2, 5], 'max sea temp ( °C)']

2    4
5    3
Name: max sea temp ( °C), dtype: int64

In [54]:
climate[climate['Av rainfall (mm)'] >= 200]

# Note: you can't actually use climate.query here, as this function doesn't work if there is whitespace in the column name.

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
0,Tasiilaq,1.95,880,3.0,-10.4,80,5,5.1
1,Ittoqqortoormiit,-4.3,230,1.0,-18.4,30,2,2.5
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
3,Ilulissat,-3.85,270,1.0,-18.8,40,4,4.4


In [55]:
climate[(climate['Av temp ( °C)'] > -4) & (climate['Av sunshine (hours)'] > 2.5)]

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
0,Tasiilaq,1.95,880,3.0,-10.4,80,5,5.1
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
3,Ilulissat,-3.85,270,1.0,-18.8,40,4,4.4
5,Maniitsoq,-3.9,180,1.0,-23.2,55,3,2.8
6,Sisimiut,-3.3,130,1.0,-25.6,50,2,2.6


In [56]:
climate[climate['Town'].isin(['Nuuk', 'Qaanaq', 'Sisimiut'])]

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
4,Qaanaq,-4.1,120,0.0,-29.1,20,1,2.1
6,Sisimiut,-3.3,130,1.0,-25.6,50,2,2.6


Here's an extra challenge! So far, we have selected rows based on multiple criteria using '&'. But what happens if we want to use 'or' instead of 'and'? For example, we might want to select the rows that have an average rainfall of greater than 500mm **or** have less than 4 sunshine hours.

Search the internet (hint: search for *python bitwise operators*), and see if you can identify which symbol you would need to do this. Then have a go selecting the rows that have an average rainfall of greater than 500mm or have less than 4 sunshine hours.

In [57]:
climate[(climate['Av rainfall (mm)'] > 500) | (climate['Av sunshine (hours)'] < 4)]

Unnamed: 0,Town,Av temp ( °C),Av rainfall (mm),Av sea temp ( °C),min temp ( °C),max rainfall (mm),max sea temp ( °C),Av sunshine (hours)
0,Tasiilaq,1.95,880,3.0,-10.4,80,5,5.1
1,Ittoqqortoormiit,-4.3,230,1.0,-18.4,30,2,2.5
2,Nuuk,-3.2,875,1.8,-10.9,105,4,3.9
4,Qaanaq,-4.1,120,0.0,-29.1,20,1,2.1
5,Maniitsoq,-3.9,180,1.0,-23.2,55,3,2.8
6,Sisimiut,-3.3,130,1.0,-25.6,50,2,2.6
