# Word Count

Given a phrase, count the occurrences of each word in that phrase.

For example for the input `"olly olly in come free"`

```text
olly: 2
in: 1
come: 1
free: 1
```
## Source

This is a classic toy problem, but we were reminded of it by seeing it in the Go Tour.


## Version compatibility
This exercise has been tested on Julia versions >=1.0.

## Submitting Incomplete Solutions
It's possible to submit an incomplete solution so you can see how others have completed the exercise.


In [50]:
# submit
wordcount(sentence::AbstractString) = (sentence 
    |> lowercase
    |> s -> replace(s, r"(?!\b'\b)[^\w ]" => " ") 
    |> s -> split(s, " ", keepempty=false) 
    |> countmap)


wordcount (generic function with 1 method)

In [51]:
using Test

# include("word-count.jl")

@testset "no words" begin
    @test wordcount(" .\n,\t!^&*()~@#\$%{}[]:;'/<>") == Dict()
end

@testset "count one word" begin
    @test wordcount("word") == Dict("word" => 1)
end

@testset "count one of each word" begin
    @test wordcount("one of each") == Dict("one" => 1, "of" => 1, "each" => 1)
end

@testset "multiple occurrences of a word" begin
    @test wordcount("one fish two fish red fish blue fish") == Dict("one" => 1, "fish" => 4, "two" => 1, "red" => 1, "blue" => 1)
end

@testset "handles cramped lists" begin
    @test wordcount("one,two,three") == Dict("one" => 1, "two" => 1, "three" => 1)
end

@testset "handles expanded lists" begin
    @test wordcount("one,\ntwo,\nthree") == Dict("one" => 1, "two" => 1, "three" => 1)
end

@testset "ignore punctuation" begin
    @test wordcount("car: carpet as java: javascript!!&@\$%^&") == Dict("car" => 1, "carpet" => 1, "as" => 1, "java" => 1, "javascript" => 1)
end

@testset "include numbers" begin
    @test wordcount("testing, 1, 2 testing") == Dict("testing" => 2, "1" => 1, "2" => 1)
end

@testset "normalize case" begin
    @test wordcount("go Go GO Stop stop") == Dict("go" => 3, "stop" => 2)
end

@testset "with apostrophes" begin
    @test wordcount("First: don't laugh. Then: don't cry.") == Dict("first" => 1, "don't" => 2, "laugh" => 1, "then" => 1, "cry" => 1)
end

@testset "with quotations" begin
    @test wordcount("Joe can't tell between 'large' and large.") == Dict("joe" => 1, "can't" => 1, "tell" => 1, "between" => 1, "large" => 2, "and" => 1)
end


[37m[1mTest Summary: | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
no words      | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:  | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
count one word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:          | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
count one of each word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:                  | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
multiple occurrences of a word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:         | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
handles cramped lists | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:          | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
handles expanded lists | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:      | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
ignore punctuation | [3

Test.DefaultTestSet("with quotations", Any[], 1, false)

In [53]:
# To submit your exercise, you need to save your solution in a file called word-count.jl before using the CLI.
# You can either create it manually or use the following functions, which will automatically
# save every notebook cell starting with `# submit` in that file.

# Pkg.add("Exercism")
using Exercism
Exercism.create_submission("word-count")


178

In [3]:
import Pkg; Pkg.add("StatsBase")

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m Installed[22m[39m GR ─ v0.39.0
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Project.toml`
 [90m [2913bbd2][39m[92m + StatsBase v0.29.0[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Manifest.toml`
 [90m [28b8d3ca][39m[93m ↑ GR v0.38.1 ⇒ v0.39.0[39m
[32m[1m  Building[22m[39m GR → `~/.julia/packages/GR/Q8slp/deps/build.log`


In [5]:
using StatsBase

In [6]:
countmap([1,2,3,3,3,4,5,6])

Dict{Int64,Int64} with 6 entries:
  4 => 1
  2 => 1
  3 => 3
  5 => 1
  6 => 1
  1 => 1

In [36]:
?split

search: [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22mext [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22mdir [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22mpath [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22mdrive r[0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22m[0m[1mt[22m [0m[1ms[22m[0m[1mp[22m[0m[1ml[22m[0m[1mi[22mce! di[0m[1ms[22m[0m[1mp[22m[0m[1ml[22mays[0m[1mi[22mze



```
split(str::AbstractString, dlm; limit::Integer=0, keepempty::Bool=true)
split(str::AbstractString; limit::Integer=0, keepempty::Bool=false)
```

Split `str` into an array of substrings on occurrences of the delimiter(s) `dlm`.  `dlm` can be any of the formats allowed by [`findnext`](@ref)'s first argument (i.e. as a string, regular expression or a function), or as a single character or collection of characters.

If `dlm` is omitted, it defaults to [`isspace`](@ref).

The optional keyword arguments are:

  * `limit`: the maximum size of the result. `limit=0` implies no maximum (default)
  * `keepempty`: whether empty fields should be kept in the result. Default is `false` without a `dlm` argument, `true` with a `dlm` argument.

See also [`rsplit`](@ref).

# Examples

```jldoctest
julia> a = "Ma.rch"
"Ma.rch"

julia> split(a,".")
2-element Array{SubString{String},1}:
 "Ma"
 "rch"
```


In [24]:
temp("HEllo!!!!! there sonny jim")

Dict{SubString{String},Int64} with 4 entries:
  "there" => 1
  "HEllo" => 1
  "sonny" => 1
  "jim"   => 1