# Strings in pandas

Panda series and data frames have build-in methods for text columns.
For more information, visit the pandas [text data documentation](https://pandas.pydata.org/docs/user_guide/text.html)
    Let us build an example data frame from the `callouts_text` object.

In [2]:
import pandas as pd

In [6]:
tweets = """Just returned to the @WhiteHouse after a great evening @ Monroe, Louisiana with a massive turnout of Great American Patriots. With early voting underway until Sat, find your polling location below & go vote for your next #LAgov, @EddieRispone! #GeauxVote➡️https://vote.donaldjtrump.com
The degenerate Washington Post MADE UP the story about me asking Bill Barr to hold a news conference. Never happened, and there were no sources!
LOUISIANA! Early voting is underway until Saturday, it’s time to get out and VOTE to REPLACE Radical Liberal Democrat John Bel Edwards with a great new REPUBLICAN Governor, @EddieRispone! #GeauxVote
“Based on the things I’ve seen, the Democrats have no case, or a weak case, at best. I don’t think there are, or will be, well founded articles of Impeachment here.” Robert Wray, respected former prosecutor. It is a phony scam by the Do Nothing Dems! @foxandfriends""".split("\n")

co_df = pd.DataFrame(tweets, columns=["content"])
co_df

Unnamed: 0,content
0,Just returned to the @WhiteHouse after a great...
1,The degenerate Washington Post MADE UP the sto...
2,LOUISIANA! Early voting is underway until Satu...
3,"“Based on the things I’ve seen, the Democrats ..."


Convert the `content` column to lowercase.

In [None]:
co_df["content"] = co_df["content"].str.lower()
co_df

Count the number of mentions in each tweet.

In [None]:
co_df["mentions_n"] = co_df["content"].str.count(r"@\w+")
co_df

Count the number of uppercase words in each tweet. Use the [] to define a character class and the `\b` character to match
word boundaries.

In [None]:
co_df["cries_n"] = co_df["content"].str.count(r"\b[A-Z0-9]+\b")
co_df

Count the length of the tweet in terms of characters.


In [None]:
## Count the length of the tweet in characters
co_df["characters_n"] = co_df["content"].str.len()
co_df

In [7]:
words_count = set()

words_split = co_df["content"].str.split("\s+", expand=True) \
    .stack() \
    .reset_index()

counts = words_split.groupby("level_0").apply(len).reset_index()
co_df["words_cnt"] = counts[0]
co_df

Unnamed: 0,content,words_cnt
0,Just returned to the @WhiteHouse after a great...,40
1,The degenerate Washington Post MADE UP the sto...,25
2,LOUISIANA! Early voting is underway until Satu...,30
3,"“Based on the things I’ve seen, the Democrats ...",47
