Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add micro-blog exercise #1509

Merged
merged 5 commits into from
Sep 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions exercises/micro-blog/canonical-data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
{
"exercise": "micro-blog",
"version": "1.0.0",
"comments": [
"This exercise is only applicable to languages that use UTF-8, UTF-16",
"or other variable width Unicode compatible encoding as their internal",
"string representation.",
"",
"This exercise is probably too easy in languages that use Unicode aware",
"string slicing.",
"",
"When adding additional tests to the problem specification, consider that",
"in progress solutions might not fail due to UTF-8 and UTF-16",
"differences.",
"",
"Avoid adding tests that involve characters (graphemes) that are made up",
"of multiple characters, or introduce them as a more advanced step.",
"",
"Consider adding a track specific hint.md about if your language uses",
"UTF-8, UTF-16 or other for its internal string representation."
],
"cases": [
{
"description": "Truncate a micro blog post",
"cases": [
{
"description": "English language short",
"property": "truncate",
"input": {
"phrase": "Hi"
},
"expected": "Hi"
},
{
"description": "English language long",
"property": "truncate",
"input": {
"phrase": "Hello there"
},
"expected": "Hello"
},
{
"description": "German language short (broth)",
"property": "truncate",
"input": {
"phrase": "brühe"
},
"expected": "brühe"
},
{
"description": "German language long (bear carpet → beards)",
"property": "truncate",
"input": {
"phrase": "Bärteppich"
},
"expected": "Bärte"
},
{
"description": "Bulgarian language short (good)",
"property": "truncate",
"input": {
"phrase": "Добър"
},
"expected": "Добър"
},
{
"description": "Greek language short (health)",
"property": "truncate",
"input": {
"phrase": "υγειά"
},
"expected": "υγειά"
},
{
"description": "Maths short",
"property": "truncate",
"input": {
"phrase": "a=πr²"
},
"expected": "a=πr²"
},
{
"description": "Maths long",
"property": "truncate",
"input": {
"phrase": "∅⊊ℕ⊊ℤ⊊ℚ⊊ℝ⊊ℂ"
},
"expected": "∅⊊ℕ⊊ℤ"
},
{
"description": "English and emoji short",
"property": "truncate",
"input": {
"phrase": "Fly 🛫"
},
"expected": "Fly 🛫"
},
{
"description": "Emoji short",
"property": "truncate",
"input": {
"phrase": "💇"
},
"expected": "💇"
},
{
"description": "Emoji long",
"property": "truncate",
"input": {
"phrase": "❄🌡🤧🤒🏥🕰😀"
},
"expected": "❄🌡🤧🤒🏥"
},
{
"description": "Royal Flush?",
"property": "truncate",
"input": {
"phrase": "🃎🂸🃅🃋🃍🃁🃊"
},
"expected": "🃎🂸🃅🃋🃍"
}
]
}
]
}
39 changes: 39 additions & 0 deletions exercises/micro-blog/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
You have identified a gap in the social media market for very very short
posts. Now that Twitter allows 280 character posts, people wanting quick
social media updates aren't being served. You decide to create your own
social media network.

To make your product noteworthy, you make it extreme and only allow posts
of 5 or less characters. Any posts of more than 5 characters should be
truncated to 5.

To allow your users to express themselves fully, you allow Emoji and
other Unicode.

The task is to truncate input strings to 5 characters.

## Text Encodings

Text stored digitally has to be converted to a series of bytes.
There are 3 ways to map characters to bytes in common use.
* **ASCII** can encode English language characters. All
characters are precisely 1 byte long.
* **UTF-8** is a Unicode text encoding. Characters take between 1
and 4 bytes.
* **UTF-16** is a Unicode text encoding. Characters are either 2 or
4 bytes long.

UTF-8 and UTF-16 are both Unicode encodings which means they're capable of
representing a massive range of characters including:
* Text in most of the world's languages and scripts
* Historic text
* Emoji

UTF-8 and UTF-16 are both variable length encodings, which means that
different characters take up different amounts of space.

Consider the letter 'a' and the emoji '😛'. In UTF-16 the letter takes
2 bytes but the emoji takes 4 bytes.

The trick to this exercise is to use APIs designed around Unicode
characters (codepoints) instead of Unicode codeunits.
3 changes: 3 additions & 0 deletions exercises/micro-blog/metadata.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
title: "Micro Blog"
blurb: "Given an input string, truncate it to 5 characters."