Skip to content

Commit 6b0a554

Browse files
authored
Merge pull request #19 from jetbrains-academy/sofia/find_unique_values
Added task Find Unique Values
2 parents 4040eec + 1dfbb3c commit 6b0a554

File tree

8 files changed

+193
-0
lines changed

8 files changed

+193
-0
lines changed

NumPy/Compare Search/Find Unique Values/__init__.py

Whitespace-only changes.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
id,metric1,metric2,metric3,metric4,metric5,metric6,metric7,metric8
2+
3,67,63,89,54,39,9,90,56
3+
3,49,1,82,35,9,41,53,24
4+
5,85,47,44,39,92,20,95,78
5+
5,96,74,37,93,91,21,76,9
6+
1,15,1,40,28,58,27,20,58
7+
5,73,57,7,50,65,81,1,12
8+
1,86,5,17,26,16,24,79,62
9+
1,32,73,33,2,32,91,22,16
10+
2,7,45,46,69,73,96,98,35
11+
2,77,80,37,41,74,3,58,94
12+
2,15,61,55,48,60,16,86,76
13+
4,37,39,81,90,31,15,64,90
14+
5,40,37,23,88,3,82,59,60
15+
2,72,13,49,12,27,87,78,66
16+
1,24,57,26,33,15,66,49,68
17+
4,90,78,89,93,31,14,21,69
18+
1,72,95,32,93,53,25,10,92
19+
4,51,84,29,15,53,29,4,53
20+
5,86,50,54,9,10,31,36,97
21+
4,80,29,93,62,26,32,50,39
22+
4,73,92,75,87,23,38,32,43
23+
2,93,47,61,81,10,20,22,9
24+
5,40,19,96,53,21,89,30,90
25+
3,92,80,90,12,78,84,52,43
26+
3,14,82,17,98,86,75,94,44
27+
1,16,100,60,24,63,13,67,34
28+
4,86,76,73,92,59,73,26,28
29+
1,73,62,87,26,21,49,33,47
30+
5,66,47,56,87,62,10,38,41
31+
2,35,23,78,91,10,12,42,21
32+
3,99,22,55,99,38,53,37,57
33+
1,86,71,37,98,15,12,43,63
34+
1,8,76,22,70,41,50,25,49
35+
1,39,90,25,100,33,88,98,80
36+
5,55,70,64,51,49,10,44,73
37+
3,46,63,75,52,75,78,82,64
38+
4,85,5,14,45,9,77,14,86
39+
2,47,42,86,93,9,7,86,92
40+
3,87,72,78,72,81,75,96,85
41+
1,15,50,70,13,36,10,82,95
42+
1,85,74,88,71,30,14,21,53
43+
4,44,59,69,84,49,56,49,63
44+
1,30,13,4,3,9,69,58,67
45+
1,60,63,29,19,97,35,100,86
46+
5,95,20,7,23,78,97,61,6
47+
3,48,21,30,78,19,59,58,18
48+
2,22,14,95,50,81,90,98,64
49+
1,28,44,16,19,59,8,12,13
50+
4,36,43,30,56,11,23,13,12
51+
2,27,16,26,80,94,79,51,28
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
type: edu
2+
files:
3+
- name: task.py
4+
visible: true
5+
placeholders:
6+
- offset: 26
7+
length: 55
8+
placeholder_text: '# TODO'
9+
- offset: 97
10+
length: 47
11+
placeholder_text: '# TODO'
12+
- offset: 156
13+
length: 17
14+
placeholder_text: '# TODO'
15+
- offset: 216
16+
length: 35
17+
placeholder_text: '# TODO'
18+
- offset: 275
19+
length: 29
20+
placeholder_text: '# TODO'
21+
- offset: 333
22+
length: 50
23+
placeholder_text: '# TODO'
24+
- name: tests/test_task.py
25+
visible: false
26+
- name: __init__.py
27+
visible: false
28+
- name: tests/__init__.py
29+
visible: false
30+
- name: data.csv
31+
visible: true
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
## Find Unique Values
2+
3+
[`numpy.unique`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) function is pretty
4+
straightforward - it finds unique elements in the input array and returns them as a sorted array:
5+
6+
```python
7+
print(np.unique([1, 1, 2, 2, 3, 3]))
8+
```
9+
Output:
10+
```text
11+
[1 2 3]
12+
```
13+
Additionally, `numpy.unique` can:
14+
15+
- identify unique rows or columns of an array (when `axis` parameter is given, when not - search is performed on the **flattened** input array):
16+
17+
```python
18+
a = np.array([[1, 2, 6], [4, 2, 3], [4, 2, 3]])
19+
print(np.unique(a))
20+
print(np.unique(a, axis=0))
21+
```
22+
Output:
23+
```text
24+
[1 2 3 4 6]
25+
[[1 2 6]
26+
[4 2 3]]
27+
```
28+
29+
- return the unique values and the number of occurrences of each unique value (`return_counts=True`):
30+
```python
31+
a = np.array([1, 2, 6, 4, 2, 3, 2])
32+
print(np.unique(a, return_counts=True))
33+
```
34+
Output:
35+
```text
36+
(array([1, 2, 3, 4, 6]), array([1, 3, 1, 1, 1]))
37+
```
38+
39+
- return the index of the first occurrences of the unique values (`return_index=True`):
40+
41+
```python
42+
a = np.array([1, 2, 6, 4, 2, 3, 2])
43+
unique, index = np.unique(a, return_index=True)
44+
print(unique, index)
45+
```
46+
Output:
47+
```text
48+
[1 2 3 4 6] [0 1 5 3 2]
49+
```
50+
### Task
51+
You are given a dataset in the file `data.csv`. The first column contains ids (class labels),
52+
all other columns - values for some metrics collected for each entry.
53+
1. [Load the dataset](course://NumPy/Array Basics/Reading and Writing Files) from the file into `csv`. Mind the header!
54+
2. [Split](course://NumPy/Array Indexing and Slicing/Indexing Basics) the dataset into `data` (a 2-D array) and `labels` (a 1-D array of **integers**).
55+
3. Determine the set of classes represented in the dataset (should be assigned to
56+
the variable `classes`)
57+
4. Find unique values and their counts in the dataset (`data`).
58+
5. Find the index of the most frequent measurement value (`most_frequent_index`) and get the measurement itself
59+
`most_frequent_measurement` using that index.
60+
61+
<div class="hint">For the last one you could use <code>numpy.argmax</code>.</div>
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
import numpy as np
2+
3+
csv = np.genfromtxt('data.csv', delimiter=',', skip_header=1)
4+
data, labels = csv[:, 1:], np.array(csv[:, 0], dtype=np.int64)
5+
6+
classes = np.unique(labels)
7+
unique_measurements, unique_data_counts = np.unique(data, return_counts=True)
8+
9+
most_frequent_index = np.argmax(unique_data_counts)
10+
most_frequent_measurement = unique_measurements.flatten()[most_frequent_index]
11+
12+
if __name__ == "__main__":
13+
print(classes)
14+
print(unique_data_counts)
15+
print(most_frequent_index)
16+
print(most_frequent_measurement)
17+

NumPy/Compare Search/Find Unique Values/tests/__init__.py

Whitespace-only changes.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import unittest
2+
import numpy as np
3+
4+
from task import *
5+
6+
test_csv = np.genfromtxt('data.csv', delimiter=',', skip_header=1)
7+
test_data, test_labels = test_csv[:, 1:], np.array(test_csv[:, 0], dtype=np.int64)
8+
test_classes = np.unique(test_labels)
9+
test_unique_measurements, test_unique_data_counts = np.unique(test_data, return_counts=True)
10+
test_most_frequent_index = np.argmax(test_unique_data_counts)
11+
test_most_frequent_measurement = test_unique_measurements.flatten()[test_most_frequent_index]
12+
13+
14+
class TestCase(unittest.TestCase):
15+
def test_data(self):
16+
np.testing.assert_array_equal(csv, test_csv, err_msg='Dataset is imported improperly.')
17+
np.testing.assert_array_equal(data, test_data, err_msg='Array of measurements is off.')
18+
np.testing.assert_array_equal(labels, test_labels, err_msg='Labels array is off.')
19+
20+
def test_unique(self):
21+
np.testing.assert_array_equal(classes, test_classes,
22+
err_msg='The set of classes is wrong.')
23+
np.testing.assert_array_equal(unique_measurements, test_unique_measurements,
24+
err_msg='The set of unique measurements is wrong.')
25+
np.testing.assert_array_equal(unique_data_counts, test_unique_data_counts,
26+
err_msg='The set containing the number of occurrences of the unique values is wrong.')
27+
28+
def test_most_frequent(self):
29+
self.assertEqual(most_frequent_index, test_most_frequent_index,
30+
msg="The index of the most frequent value is incorrect.")
31+
self.assertEqual(most_frequent_measurement, test_most_frequent_measurement,
32+
msg="The most frequent value is identified incorrectly.")

NumPy/Compare Search/lesson-info.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@ content:
44
- Element-wise Comparison
55
- Find maximum
66
- Search
7+
- Find Unique Values

0 commit comments

Comments
 (0)