# textacy Preprocessing Methods for NLP
The textacy package normalizes, removes, or replaces problematic characters in text. The preprocessing is operated on raw text. A convenience method for creating a pipeline of multiple preprocessing operations is available. 

The text used in each section is copied directly from the textacy unit tests.

In [1]:
import textacy.preprocessing as preprocessing

### Normalize bullet points
No optional parameters.

In [2]:
text = ("• foo\n• bar\n"
        "• foo\n    • bar"
        "\n‣ item1\n⁃ item2\n⁌ item3\n⁍ item4\n∙ item5\n▪ item6\n● item7\n◦ item8"
        "\n⦾ item1\n⦿ item2\n・ item3"
)
bp_text = preprocessing.normalize.bullet_points(text)
print("Original   | Processed  |")
print("=========================")
for orig, proc in zip(text.split('\n'), bp_text.split('\n')):
    print(f"{orig:10} | {proc:10} |")


Original   | Processed  |
• foo      | - foo      |
• bar      | - bar      |
• foo      | - foo      |
    • bar  |     - bar  |
‣ item1    | - item1    |
⁃ item2    | - item2    |
⁌ item3    | - item3    |
⁍ item4    | - item4    |
∙ item5    | - item5    |
▪ item6    | - item6    |
● item7    | - item7    |
◦ item8    | - item8    |
⦾ item1    | - item1    |
⦿ item2    | - item2    |
・ item3    | - item3    |


### Normalize hyphenated words
No optional parameters.

In [3]:
text = ("I see you shiver with antici- pation.\n"
        "I see you shiver with antici-   \npation.\n"
        "I see you shiver with antici- PATION.\n"
        "I see you shiver with antici- 1pation.\n"
        "I see you shiver with antici pation.\n"
        "I see you shiver with antici-pation.\n"
        "My phone number is 555- 1234.\n"
        "I got an A- on the test."
)
hyphen_text = preprocessing.normalize.hyphenated_words(text)
print("{:38} | {:38} |".format("Original", "Processed"))
print("="*81)
for orig, hyphen in zip(text.split('\n'), hyphen_text.split('\n')):
    print(f"{orig:38} | {hyphen:38} |")

Original                               | Processed                              |
I see you shiver with antici- pation.  | I see you shiver with anticipation.    |
I see you shiver with antici-          | I see you shiver with anticipation.    |
pation.                                | I see you shiver with anticiPATION.    |
I see you shiver with antici- PATION.  | I see you shiver with antici- 1pation. |
I see you shiver with antici- 1pation. | I see you shiver with antici pation.   |
I see you shiver with antici pation.   | I see you shiver with antici-pation.   |
I see you shiver with antici-pation.   | My phone number is 555- 1234.          |
My phone number is 555- 1234.          | I got an A- on the test.               |


### Normalize quotation marks
No optional parameters.

In [4]:
text = ("These are ´funny single quotes´.\n"
        "These are ‘fancy single quotes’.\n"
        "These are “fancy double quotes”."
)
quote_text = preprocessing.normalize.quotation_marks(text)
print("{:32} | {:32} |".format("Original", "Processed"))
print("="*69)
for orig, quote in zip(text.split('\n'), quote_text.split('\n')):
    print(f"{orig:32} | {quote:32} |")

Original                         | Processed                        |
These are ´funny single quotes´. | These are 'funny single quotes'. |
These are ‘fancy single quotes’. | These are 'fancy single quotes'. |
These are “fancy double quotes”. | These are "fancy double quotes". |


### Normalize repeating characters
Optional parameters:
- chars: string, characters that are repeated
- maxn: int, number of allowable character repeats

In [5]:
text = "**Hello**, world!!! I wonder....... How are *you* doing?!?! lololol"
repeating_text1 = preprocessing.normalize.repeating_chars(text, chars=".", maxn=3)
print("Original")
print(text)
print("="*67)
print('chars=".", maxn=3')
print(repeating_text1)
print("="*67)
repeating_text2 = preprocessing.normalize.repeating_chars(text, chars="*", maxn=1)
print('chars="*", maxn=1')
print(repeating_text2)
print("="*67)
repeating_text3 = preprocessing.normalize.repeating_chars(text, chars="?!", maxn=1)
print('chars="?!", maxn=1')
print(repeating_text3)
print("="*67)
repeating_text4 = preprocessing.normalize.repeating_chars(text, chars="ol", maxn=2)
print('chars="ol", maxn=2')
print(repeating_text4)
print("="*67)
repeating_text5 = preprocessing.normalize.repeating_chars(text, chars="*", maxn=0)
print('chars="*", maxn=0')
print(repeating_text5)

Original
**Hello**, world!!! I wonder....... How are *you* doing?!?! lololol
chars=".", maxn=3
**Hello**, world!!! I wonder... How are *you* doing?!?! lololol
chars="*", maxn=1
*Hello*, world!!! I wonder....... How are *you* doing?!?! lololol
chars="?!", maxn=1
**Hello**, world!!! I wonder....... How are *you* doing?! lololol
chars="ol", maxn=2
**Hello**, world!!! I wonder....... How are *you* doing?!?! lolol
chars="*", maxn=0
Hello, world!!! I wonder....... How are you doing?!?! lololol


### Normalize unicode
Optional parameters:
- form: str
    - NFC: canonical composition
    - NFD: canonical decomposition
    - NFKC: compatibility decomposition
    - NFKD: compatibility decomposition followed by canonical composition

See also:

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize

In [6]:
text = "Well… That's a long story."
unicode1_text = preprocessing.normalize.unicode(text, form="NFC")
unicode2_text = preprocessing.normalize.unicode(text, form="NFD")
unicode3_text = preprocessing.normalize.unicode(text, form="NFKC")
unicode4_text = preprocessing.normalize.unicode(text, form="NFKD")
print("{:26} | {:26} | {:26} | {:28} |".format("Original", 'form="NFC"', 'form="NFD"', 'form="NFKC"'))
print("="*117)
for orig, t1, t2, t3 in zip(text.split('\n'), unicode1_text.split('\n'), unicode2_text.split('\n'), unicode3_text.split('\n')):
    print(f"{orig:26} | {t1:26} | {t2:26} | {t3:28} |")

print("="*117)
print("{:28}".format('form="NFKD"'))
print("="*30)
for t4 in unicode4_text.split('\n'):
    print(f"{t4:28} |")

Original                   | form="NFC"                 | form="NFD"                 | form="NFKC"                  |
Well… That's a long story. | Well… That's a long story. | Well… That's a long story. | Well... That's a long story. |
form="NFKD"                 
Well... That's a long story. |


### Normalize whitespace
No optional parameters.

In [7]:
text = ("Hello,  world!\n"
        "Hello,     world!\n"
        "Hello,\tworld!\n"
        "Hello,\t\t  world!\n"
        "Hello,\n\nworld!\n"
        "Hello,\r\nworld!\n"
        "Hello\uFEFF, world!\n"
        "Hello\u200B\u200B, world!\n"
        "Hello\uFEFF,\n\n\nworld   !  "
)
whitespace_text = preprocessing.normalize.whitespace(text)
print("Original")
print("="*60)
print(text)
print("-"*60)
print("Processed")
print("="*60)
print(whitespace_text)

Original
Hello,  world!
Hello,     world!
Hello,	world!
Hello,		  world!
Hello,

world!
Hello,
world!
Hello﻿, world!
Hello​​, world!
Hello﻿,


world   !  
------------------------------------------------------------
Processed
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello,
world!
Hello,
world!
Hello, world!
Hello, world!
Hello,
world !


### Remove accents
Optional parameter:
- fast: bool
    - True: accented characters for all unicode symbols are removed
    - False: accents removed from unicode symbol with a direct ASCII equivalent

In [8]:
text = ("El niño se asustó del pingüino -- qué miedo!\n"
        "El niño se asustó del pingüino -- qué miedo!\n"
        "Le garçon est très excité pour la forêt.\n"
        "Le garçon est très excité pour la forêt."
)
accent_text1 = preprocessing.remove.accents(text, fast=True)
# print(accent_text1)
accent_text2 = preprocessing.remove.accents(text, fast=False)
# print(accent_text2)
print("{:44} | {:44}".format("Original", "fast=True"))
print("="*60)
for orig, t1 in zip(text.split('\n'), accent_text1.split('\n')):
    print(f"{orig:44} | {t1:44} |")
print("")
print("{:44} | {:44}".format("Original", "fast=False"))
print("="*60)
for orig, t2 in zip(text.split('\n'), accent_text2.split('\n')):
    print(f"{orig:44} | {t2:44} |")

Original                                     | fast=True                                   
El niño se asustó del pingüino -- qué miedo! | El nino se asusto del pinguino -- que miedo! |
El niño se asustó del pingüino -- qué miedo! | El nino se asusto del pinguino -- que miedo! |
Le garçon est très excité pour la forêt.     | Le garcon est tres excite pour la foret.     |
Le garçon est très excité pour la forêt.     | Le garcon est tres excite pour la foret.     |

Original                                     | fast=False                                  
El niño se asustó del pingüino -- qué miedo! | El nino se asusto del pinguino -- que miedo! |
El niño se asustó del pingüino -- qué miedo! | El nino se asusto del pinguino -- que miedo! |
Le garçon est très excité pour la forêt.     | Le garcon est tres excite pour la foret.     |
Le garçon est très excité pour la forêt.     | Le garcon est tres excite pour la foret.     |


### Remove brackets
Careful here, all text between brackets is removed along with the brackets.

Optional parameter:
- only: tuple, str - remove only the specified bracketed contents: "curly", "square", and/or "round"

In [9]:
text = ("Hello, {name}!\n"
        "Hello, world (DeWilde et al., 2021, p. 42)!\n"
        "Hello, world (1)!\n"
        "Hello, world [1]!\n"
        "Hello, world (and whomever it may concern [not that it's any of my business])!\n"
        "Hello, world (and whomever it may concern (not that it's any of my business))!\n"
        "Hello, world (and whomever it may concern [not that it's any of my business])!\n"
        "Hello, world [1]!\n"
        "Hello, world [1]!"
)
print("Original")
print("="*80)
print(text)
print("-"*80)
bracket_text1 = preprocessing.remove.brackets(text, only=None)
print("only=None")
print("="*80)
print(bracket_text1)
print("-"*80)
bracket_text2 = preprocessing.remove.brackets(text, only="round")
print('only="round"')
print("="*80)
print(bracket_text2)
print("-"*80)
bracket_text3 = preprocessing.remove.brackets(text, only="curly")
print('only="curly"')
print("="*80)
print(bracket_text3)
print("-"*80)
bracket_text4 = preprocessing.remove.brackets(text, only="square")
print('only="square"')
print("="*80)
print(bracket_text4)
print("-"*80)
bracket_text5 = preprocessing.remove.brackets(text, only=("square", "round"))
print('only=("square", "round")')
print("="*80)
print(bracket_text5)

Original
Hello, {name}!
Hello, world (DeWilde et al., 2021, p. 42)!
Hello, world (1)!
Hello, world [1]!
Hello, world (and whomever it may concern [not that it's any of my business])!
Hello, world (and whomever it may concern (not that it's any of my business))!
Hello, world (and whomever it may concern [not that it's any of my business])!
Hello, world [1]!
Hello, world [1]!
--------------------------------------------------------------------------------
only=None
Hello, !
Hello, world !
Hello, world !
Hello, world !
Hello, world !
Hello, world (and whomever it may concern )!
Hello, world !
Hello, world !
Hello, world !
--------------------------------------------------------------------------------
only="round"
Hello, {name}!
Hello, world !
Hello, world !
Hello, world [1]!
Hello, world !
Hello, world (and whomever it may concern )!
Hello, world !
Hello, world [1]!
Hello, world [1]!
--------------------------------------------------------------------------------
only="curly"
Hello, !
He

### Remove HTML tags
No optional parameters.

In [10]:
text = ("Hello, <i>world!</i>\n"
        "<title>Hello, world!</title>\n"
        '<title class="foo">Hello, world!</title>\n'
        "<html><head><title>Hello, <i>world!</i></title></head></html>\n"
            "<html>\n"
            "  <head>\n"
            '    <title class="foo">Hello, <i>world!</i></title>\n'
            "  </head>\n"
            "  <!--this is a comment-->\n"
            "  <body>\n"
            "    <p>How's it going?</p>\n"
            "  </body>\n"
            "</html>"
)
html_text = preprocessing.remove.html_tags(text)
print("Original")
print("="*60)
print(text)
print("-"*60)
print("Processed")
print("="*60)
print(html_text)

Original
Hello, <i>world!</i>
<title>Hello, world!</title>
<title class="foo">Hello, world!</title>
<html><head><title>Hello, <i>world!</i></title></head></html>
<html>
  <head>
    <title class="foo">Hello, <i>world!</i></title>
  </head>
  <!--this is a comment-->
  <body>
    <p>How's it going?</p>
  </body>
</html>
------------------------------------------------------------
Processed
Hello, world!
Hello, world!
Hello, world!
Hello, world!

  
    Hello, world!
  
  
  
    How's it going?


### Remove punctuation
Optional parameter:
- only: str, tuple, list - remove only the punctuation marks specified

In [11]:
text = "I can't. No, I won't! It's a matter of \"principle\"; of -- what's the word? -- conscience."
print("Original")
print("="*90)
print(text)
print("-"*90)
punc_text1 = preprocessing.remove.punctuation(text, only=None)
print("only=None (default)")
print("="*90)
print(punc_text1)
print("-"*90)
punc_text2 = preprocessing.remove.punctuation(text, only=".")
print('only="."')
print("="*90)
print(punc_text2)
print("-"*90)
punc_text3 = preprocessing.remove.punctuation(text, only=["-", "'", "\""])
print(r'only=["-", "`", "\""]')
print("="*90)
print(punc_text3)

Original
I can't. No, I won't! It's a matter of "principle"; of -- what's the word? -- conscience.
------------------------------------------------------------------------------------------
only=None (default)
I can t  No  I won t  It s a matter of  principle   of    what s the word     conscience 
------------------------------------------------------------------------------------------
only="."
I can't  No, I won't! It's a matter of "principle"; of -- what's the word? -- conscience 
------------------------------------------------------------------------------------------
only=["-", "`", "\""]
I can t. No, I won t! It s a matter of  principle ; of   what s the word?   conscience.


### Replace currency symbols
Optional parameter:
- repl: str, value to replace currency symbols, default \_CUR\_

In [12]:
text = ("$1.00 equals 100¢.\n"
        "How much is ¥100 in £?\n"
        "My password is 123$abc฿."
)
curr_text = preprocessing.replace.currency_symbols(text)
print("{:24} | {:32} |".format("Original", "Processed"))
print("="*61)
for orig, curr in zip(text.split('\n'), curr_text.split('\n')):
    print(f"{orig:24} | {curr:32} |")

Original                 | Processed                        |
$1.00 equals 100¢.       | _CUR_1.00 equals 100_CUR_.       |
How much is ¥100 in £?   | How much is _CUR_100 in _CUR_?   |
My password is 123$abc฿. | My password is 123_CUR_abc_CUR_. |


### Replace email addresses
Optional parameter:
- repl: str, text to replace email address, default \_EMAIL\_

In [13]:
text = ("Reach out at username@example.com.\n"
        "Click here: mailto:username@example.com."
)
email_text = preprocessing.replace.emails(text)
print("{:40} | {:21} |".format("Original", "Processed"))
print("="*66)
for orig, em in zip(text.split('\n'), email_text.split('\n')):
    print(f"{orig:40} | {em:21} |")

Original                                 | Processed             |
Reach out at username@example.com.       | Reach out at _EMAIL_. |
Click here: mailto:username@example.com. | Click here: _EMAIL_.  |


### Replace emoji
Optional parameter:
- repl: str, text to replace emoji and pictographs, default \_EMOJI\_

In [14]:
text = ("ugh, it's raining *again* ☔\n"
        "✌ tests are passing ✌"
)
emoji_text = preprocessing.replace.emojis(text)
print("{:28} | {:33} |".format("Original", "Processed"))
print("="*66)
for t, em in zip(text.split('\n'), emoji_text.split('\n')):
    print(f"{t:27} | {em:33} |")

Original                     | Processed                         |
ugh, it's raining *again* ☔ | ugh, it's raining *again* _EMOJI_ |
✌ tests are passing ✌       | _EMOJI_ tests are passing _EMOJI_ |


### Replace hashtags
Optional parameter:
- repl: str, text to replace hashtags, default \_TAG\_

In [15]:
text = ("like omg it's #ThrowbackThursday\n"
        "#TextacyIn4Words: \"but it's honest work\"\n"
        "wth twitter #ican'teven #why-even-try\n"
        "www.foo.com#fragment is not a hashtag"
)
hash_text = preprocessing.replace.hashtags(text)
print("{:40} | {:38} |".format("Original", "Processed"))
print("="*83)
for t, h in zip(text.split('\n'), hash_text.split('\n')):
    print(f"{t:40} | {h:38} |")

Original                                 | Processed                              |
like omg it's #ThrowbackThursday         | like omg it's _TAG_                    |
#TextacyIn4Words: "but it's honest work" | _TAG_: "but it's honest work"          |
wth twitter #ican'teven #why-even-try    | wth twitter _TAG_'teven _TAG_-even-try |
www.foo.com#fragment is not a hashtag    | www.foo.com#fragment is not a hashtag  |


### Replace numbers
Optional parameter:
- repl: str, text to replace numbers, default \_NUMBER\_

In [16]:
text = "I owe $1,000.99 to 123 people for 2 +1 reasons."
num_text = preprocessing.replace.numbers(text)
print("Original")
print("="*65)
print(text)
print("-"*65)
print("Processed")
print("="*65)
print(num_text)

Original
I owe $1,000.99 to 123 people for 2 +1 reasons.
-----------------------------------------------------------------
Processed
I owe $_NUMBER_ to _NUMBER_ people for _NUMBER_ _NUMBER_ reasons.


### Replace phone numbers
Optional parameter:
- repl: str, text to replace phone numbers, default \_PHONE\_

In [17]:
text = "I can be reached at 555-123-4567 through next Friday."
phone_text = preprocessing.replace.phone_numbers(text)
print("Original")
print("="*53)
print(text)
print("-"*53)
print("Processed")
print("="*53)
print(phone_text)

Original
I can be reached at 555-123-4567 through next Friday.
-----------------------------------------------------
Processed
I can be reached at _PHONE_ through next Friday.


### Replace URLs
Optional parameter:
- repl: str, text to replace URL, default \_URL\_

In [18]:
text = "I learned everything I know from www.stackoverflow.com and http://wikipedia.org/ and Mom."
url_text = preprocessing.replace.urls(text)
print("Original")
print("="*89)
print(text)
print("-"*89)
print("Processed")
print("="*89)
print(url_text)

Original
I learned everything I know from www.stackoverflow.com and http://wikipedia.org/ and Mom.
-----------------------------------------------------------------------------------------
Processed
I learned everything I know from _URL_ and _URL_ and Mom.


### Replace user handles
Optional parameter:
- repl: str, text to replace user handles, default \_USER\_

In [19]:
text = ("like omg it's @bjdewilde\n"
        "@Real_Burton_DeWilde: definitely not a bot\n"
        "wth twitter @b.j.dewilde\n"
        "foo@bar.com is not a user handle"
)
user_text = preprocessing.replace.user_handles(text)
print("{:42} | {:32} |".format("Original", "Processed"))
print("="*79)
for t, u in zip(text.split('\n'), user_text.split('\n')):
    print(f"{t:42} | {u:32} |")

Original                                   | Processed                        |
like omg it's @bjdewilde                   | like omg it's _USER_             |
@Real_Burton_DeWilde: definitely not a bot | _USER_: definitely not a bot     |
wth twitter @b.j.dewilde                   | wth twitter _USER_.j.dewilde     |
foo@bar.com is not a user handle           | foo@bar.com is not a user handle |


### Combining into a pipeline

In [20]:
from functools import partial
all_text = ("• foo\n• bar\n"
            "• foo\n    • bar"
            "\n‣ item1\n⁃ item2\n⁌ item3\n⁍ item4\n∙ item5\n▪ item6\n● item7\n◦ item8"
            "\n⦾ item1\n⦿ item2\n・ item3\n"
            "I see you shiver with antici- pation.\n"
            "I see you shiver with antici-   \npation.\n"
            "I see you shiver with antici- PATION.\n"
            "I see you shiver with antici- 1pation.\n"
            "I see you shiver with antici pation.\n"
            "I see you shiver with antici-pation.\n"
            "My phone number is 555- 1234.\n"
            "I got an A- on the test.\n"
            "These are ´funny single quotes´.\n"
            "These are ‘fancy single quotes’.\n"
            "These are “fancy double quotes”.\n"
            "**Hello**, world!!! I wonder....... How are *you* doing?!?! lololol\n"
            "Hello,  world!\n"
            "Hello,     world!\n"
            "Hello,\tworld!\n"
            "Hello,\t\t  world!\n"
            "Hello,\n\nworld!\n"
            "Hello,\r\nworld!\n"
            "Hello\uFEFF, world!\n"
            "Hello\u200B\u200B, world!\n"
            "Hello\uFEFF,\n\n\nworld   !  \n"
            "El niño se asustó del pingüino -- qué miedo!\n"
            "El niño se asustó del pingüino -- qué miedo!\n"
            "Le garçon est très excité pour la forêt.\n"
            "Le garçon est très excité pour la forêt.\n"
            "Hello, {name}!\n"
            "Hello, world (DeWilde et al., 2021, p. 42)!\n"
            "Hello, world (1)!\n"
            "Hello, world [1]!\n"
            "Hello, world (and whomever it may concern [not that it's any of my business])!\n"
            "Hello, world (and whomever it may concern (not that it's any of my business))!\n"
            "Hello, world (and whomever it may concern [not that it's any of my business])!\n"
            "Hello, world [1]!\n"
            "Hello, world [1]!\n"
            "Hello, <i>world!</i>\n"
            "<title>Hello, world!</title>\n"
            '<title class="foo">Hello, world!</title>\n'
            "<html><head><title>Hello, <i>world!</i></title></head></html>\n"
            "<html>\n"
            "  <head>\n"
            '    <title class="foo">Hello, <i>world!</i></title>\n'
            "  </head>\n"
            "  <!--this is a comment-->\n"
            "  <body>\n"
            "    <p>How's it going?</p>\n"
            "  </body>\n"
            "</html>\n"
            "I can't. No, I won't! It's a matter of \"principle\"; of -- what's the word? -- conscience.\n"
            "$1.00 equals 100¢.\n"
            "How much is ¥100 in £?\n"
            "My password is 123$abc฿.\n"
            "Reach out at username@example.com.\n"
            "Click here: mailto:username@example.com.\n"
            "ugh, it's raining *again* ☔\n"
            "✌ tests are passing ✌\n"
            "like omg it's #ThrowbackThursday\n"
            "#TextacyIn4Words: \"but it's honest work\"\n"
            "wth twitter #ican'teven #why-even-try\n"
            "www.foo.com#fragment is not a hashtag\n"
            "I owe $1,000.99 to 123 people for 2 +1 reasons.\n"
            "I can be reached at 555-123-4567 through next Friday.\n"
            "I learned everything I know from www.stackoverflow.com and http://wikipedia.org/ and Mom.\n"
            "like omg it's @bjdewilde\n"
            "@Real_Burton_DeWilde: definitely not a bot\n"
            "wth twitter @b.j.dewilde\n"
            "foo@bar.com is not a user handle"
)

pipeline_list = [preprocessing.normalize.bullet_points, 
                 preprocessing.normalize.hyphenated_words,
                 preprocessing.normalize.quotation_marks,
                 partial(preprocessing.normalize.repeating_chars, chars=".", maxn=3),
                 partial(preprocessing.normalize.unicode, form="NFKC"),
                 preprocessing.normalize.whitespace,
                 partial(preprocessing.remove.accents, fast=True),
                 partial(preprocessing.remove.brackets, only="square"),
                 preprocessing.remove.html_tags,
                 partial(preprocessing.remove.punctuation, only="\""),
                 preprocessing.replace.currency_symbols,
                 preprocessing.replace.emails,
                 preprocessing.replace.emojis,
                 preprocessing.replace.hashtags,
                 preprocessing.replace.numbers,
                 preprocessing.replace.phone_numbers,
                 preprocessing.replace.urls,
                 preprocessing.replace.user_handles]
pipeline = preprocessing.make_pipeline(*pipeline_list)
clean_text = pipeline(all_text)
print(clean_text)

- foo
- bar
- foo
 - bar
- item1
- item2
- item3
- item4
- item5
- item6
- item7
- item8
- item1
- item2
- item3
I see you shiver with anticipation.
I see you shiver with anticipation.
I see you shiver with anticiPATION.
I see you shiver with antici- 1pation.
I see you shiver with antici pation.
I see you shiver with antici-pation.
My phone number is _NUMBER_- _NUMBER_.
I got an A- on the test.
These are 'funny single quotes'.
These are 'fancy single quotes'.
These are  fancy double quotes .
**Hello**, world!!! I wonder... How are *you* doing?!?! lololol
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello,
world!
Hello,
world!
Hello, world!
Hello, world!
Hello,
world ! 
El nino se asusto del pinguino -- que miedo!
El nino se asusto del pinguino -- que miedo!
Le garcon est tres excite pour la foret.
Le garcon est tres excite pour la foret.
Hello, {name}!
Hello, world (DeWilde et al., _NUMBER_, p. _NUMBER_)!
Hello, world (_NUMBER_)!
Hello, world !
Hello, world (and whomever it 

In [32]:
a = input('input a list:')

input a list: 'a, c, d'


"'a, c, d'"