Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Exercise]: Word Count #236

Merged
merged 2 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,14 @@
"prerequisites": [],
"difficulty": 2
},
{
"slug": "word-count",
"name": "Word Count",
"uuid": "0c5a329e-fb07-4c6e-abbf-039b9fc17fcb",
"practices": [],
"prerequisites": [],
"difficulty": 2
},
{
"slug": "matrix",
"name": "Matrix",
Expand Down
47 changes: 47 additions & 0 deletions exercises/practice/word-count/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Instructions

Your task is to count how many times each word occurs in a subtitle of a drama.

The subtitles from these dramas use only ASCII characters.

The characters often speak in casual English, using contractions like _they're_ or _it's_.
Though these contractions come from two words (e.g. _we are_), the contraction (_we're_) is considered a single word.

Words can be separated by any form of punctuation (e.g. ":", "!", or "?") or whitespace (e.g. "\t", "\n", or " ").
The only punctuation that does not separate words is the apostrophe in contractions.

Numbers are considered words.
If the subtitles say _It costs 100 dollars._ then _100_ will be its own word.

Words are case insensitive.
For example, the word _you_ occurs three times in the following sentence:

> You come back, you hear me? DO YOU HEAR ME?

The ordering of the word counts in the results doesn't matter.

Here's an example that incorporates several of the elements discussed above:

- simple words
- contractions
- numbers
- case insensitive words
- punctuation (including apostrophes) to separate words
- different forms of whitespace to separate words

`"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.`

The mapping for this subtitle would be:

```text
123: 1
agent: 1
cried: 1
fled: 1
i: 1
password: 2
so: 1
special: 1
that's: 1
the: 2
```
8 changes: 8 additions & 0 deletions exercises/practice/word-count/.docs/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Introduction

You teach English as a foreign language to high school students.

You've decided to base your entire curriculum on TV shows.
You need to analyze which words are used, and how often they're repeated.

This will let you choose the simplest shows to start with, and to gradually increase the difficulty as time passes.
31 changes: 31 additions & 0 deletions exercises/practice/word-count/.meta/WordCount.example.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Function Get-WordCount() {
<#
.SYNOPSIS
Given a phrase, count how many time each word appear.

.DESCRIPTION
Count how many time each word appear in a phrase. Number in string also counted as a word, words are case insensitive.

.PARAMETER Phrase
The phrase to count words.

.EXAMPLE
Get-WordCount -Phrase "Hello, welcome to exercism!"
Return: @{ hello = 1; welcome = 1; to = 1; exercism = 1}
#>
[CmdletBinding()]
Param(
[string]$Phrase
)
$WordCounts = @{}

[regex]::Matches($Phrase, "[A-Za-z]+'?[A-Za-z]*\b|[A-Za-z]+|\d") | ForEach-Object {
$word = $_.Value.ToLower()
if ($WordCounts.ContainsKey($word)) {
$WordCounts[$word]++
}else {
$WordCounts[$word] = 1
}
}
return $WordCounts
}
16 changes: 16 additions & 0 deletions exercises/practice/word-count/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"authors": ["glaxxie"],
"files": {
"solution": [
"WordCount.ps1"
],
"test": [
"WordCount.tests.ps1"
],
"example": [
".meta/WordCount.example.ps1"
]
},
"blurb": "Given a phrase, count the occurrences of each word in that phrase.",
"source": "This is a classic toy problem, but we were reminded of it by seeing it in the Go Tour."
}
57 changes: 57 additions & 0 deletions exercises/practice/word-count/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# This is an auto-generated file.
#
# Regenerating this file via `configlet sync` will:
# - Recreate every `description` key/value pair
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion)
# - Preserve any other key/value pair
#
# As user-added comments (using the # character) will be removed when this file
# is regenerated, comments can be added via a `comment` key.

[61559d5f-2cad-48fb-af53-d3973a9ee9ef]
description = "count one word"

[5abd53a3-1aed-43a4-a15a-29f88c09cbbd]
description = "count one of each word"

[2a3091e5-952e-4099-9fac-8f85d9655c0e]
description = "multiple occurrences of a word"

[e81877ae-d4da-4af4-931c-d923cd621ca6]
description = "handles cramped lists"

[7349f682-9707-47c0-a9af-be56e1e7ff30]
description = "handles expanded lists"

[a514a0f2-8589-4279-8892-887f76a14c82]
description = "ignore punctuation"

[d2e5cee6-d2ec-497b-bdc9-3ebe092ce55e]
description = "include numbers"

[dac6bc6a-21ae-4954-945d-d7f716392dbf]
description = "normalize case"

[4185a902-bdb0-4074-864c-f416e42a0f19]
description = "with apostrophes"
include = false

[4ff6c7d7-fcfc-43ef-b8e7-34ff1837a2d3]
description = "with apostrophes"
reimplements = "4185a902-bdb0-4074-864c-f416e42a0f19"

[be72af2b-8afe-4337-b151-b297202e4a7b]
description = "with quotations"

[8d6815fe-8a51-4a65-96f9-2fb3f6dc6ed6]
description = "substrings from the beginning"

[c5f4ef26-f3f7-4725-b314-855c04fb4c13]
description = "multiple spaces not detected as a word"

[50176e8a-fe8e-4f4c-b6b6-aa9cf8f20360]
description = "alternating word separators not detected as a word"

[6d00f1db-901c-4bec-9829-d20eb3044557]
description = "quotation for word with apostrophe"
22 changes: 22 additions & 0 deletions exercises/practice/word-count/WordCount.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Function Get-WordCount() {
<#
.SYNOPSIS
Given a phrase, count how many time each word appear.

.DESCRIPTION
Count how many time each word appear in a phrase. Number in string also counted as word, and words are case insensitive.

.PARAMETER Phrase
The phrase to count words.

.EXAMPLE
Get-WordCount -Phrase "Hello, welcome to exercism!"
Return: @{ hello = 1; welcome = 1; to = 1; exercism = 1}
#>
[CmdletBinding()]
Param(
[string]$Phrase
)
Throw "Please implement this function"
}

118 changes: 118 additions & 0 deletions exercises/practice/word-count/WordCount.tests.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
BeforeAll {
. ".\WordCount.ps1"
}

Describe "Word Count Test Cases" {
It "count one word" {
$got = (Get-WordCount -Phrase "hello").GetEnumerator() | Sort-Object Name
$want = @{ hello = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "count one of each word" {
$got = (Get-WordCount -Phrase "welcome to exercism").GetEnumerator() | Sort-Object Name
$want = @{ welcome = 1; to = 1; exercism = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "multiple occurrences of a word" {
$got = (Get-WordCount -Phrase "one fish two fish red fish blue fish").GetEnumerator() | Sort-Object Name
$want = @{ one = 1; fish = 4; two = 1; red = 1; blue = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -Be $want
}

It "handles cramped lists" {
$got = (Get-WordCount -Phrase "one,two,three").GetEnumerator() | Sort-Object Name
$want = @{ one = 1; two = 1; three = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "handles expanded lists" {
$got = (Get-WordCount -Phrase "one,`ntwo,`nthree").GetEnumerator() | Sort-Object Name
$want = @{ one = 1; two = 1; three = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "ignore punctuation" {
$got = (Get-WordCount -Phrase "car: carpet as java: javascript!!&@$%^&").GetEnumerator() | Sort-Object Name
$want = @{ car = 1; carpet = 1; as = 1; java = 1; javascript = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "include numbers" {
$got = (Get-WordCount -Phrase "testing, 1, 2, 3 testing").GetEnumerator() | Sort-Object Name
$want = @{ testing = 2; "1" = 1; "2" = 1; "3" = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "normalize case" {
$got = (Get-WordCount -Phrase "go Go GO Stop stop").GetEnumerator() | Sort-Object Name
$want = @{ go = 3; stop = 2 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "with quotations" {
$got = (Get-WordCount -Phrase "Joe can't tell between 'large' and large.").GetEnumerator() | Sort-Object Name
$want = @{ joe = 1; "can't" = 1; tell = 1; between = 1; large = 2; and = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "substrings from the beginning" {
$got = (Get-WordCount -Phrase "Joe can't tell between app, apple and a.").GetEnumerator() | Sort-Object Name
$want = @{ joe = 1; "can't" = 1; tell = 1; between = 1; app = 1; apple = 1; and = 1; a = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "multiple spaces not detected as a word" {
$got = (Get-WordCount -Phrase " multiple whitespaces ").GetEnumerator() | Sort-Object Name
$want = @{ multiple = 1; whitespaces = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "alternating word separators not detected as a word" {
$got = (Get-WordCount -Phrase ",`n,one,`n ,two `n 'three'").GetEnumerator() | Sort-Object Name
$want = @{ one = 1; two = 1; three = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "quotation for word with apostrophe" {
$got = (Get-WordCount -Phrase "can, can't, 'can't'").GetEnumerator() | Sort-Object Name
$want = @{ can = 1; "can't" = 2 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "with apostrophes" {
$got = (Get-WordCount -Phrase "'First: don't laugh. Then: don't cry. You're getting it.'").GetEnumerator() | Sort-Object Name
$want = @{ first = 1; "don't" = 2; laugh = 1; then = 1; cry= 1; "you're" = 1; getting = 1; it = 1}.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "non alphanumeric" {
$got = (Get-WordCount -Phrase "hey,my_spacebar_is_broken").GetEnumerator() | Sort-Object Name
$want = @{ hey = 1; my = 1; spacebar = 1; is = 1; broken = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

It "multiple apostrophes ignored" {
$got = (Get-WordCount -Phrase "''hey''").GetEnumerator() | Sort-Object Name
$want = @{ hey = 1 }.GetEnumerator() | Sort-Object Name

$got | Should -BeExactly $want
}

}