Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for Markdown (Italics) #8

Merged
merged 22 commits into from
Sep 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
0a6388c
Began adding markdown support to process_folder method
rjwignar Sep 22, 2023
cdcf508
Modified comments
rjwignar Sep 22, 2023
d32613f
Copied test1.txt and test2.txt into test1.md and test2.md. Added ital…
rjwignar Sep 22, 2023
16d05a5
Changed test1.md and test2.md to test3.md and test4.md
rjwignar Sep 22, 2023
d25b9cd
Deleted test1.md and test2.md from git repo
rjwignar Sep 22, 2023
ea5f3ae
Modified logic that checks if no .txt files found in input_folder to …
rjwignar Sep 22, 2023
aa00d89
Modified logic that processes each .txt file to process each .txt and…
rjwignar Sep 22, 2023
120ae10
Modified process_text_file to convert italics Markdown to HTML using …
rjwignar Sep 22, 2023
8f4aa4f
Removed output test files from repo
rjwignar Sep 22, 2023
6c1fe0a
Fixed indent in line 88 that affected HTML conversion in process_text…
rjwignar Sep 22, 2023
b010a79
Added test3.html and test4.html to examples/test-folder-output
rjwignar Sep 22, 2023
a2ecfb0
Added test5.txt, which contains markdown for italics (should not be c…
rjwignar Sep 22, 2023
708702f
In process_text_file, Modified logic for italics markdown conversion …
rjwignar Sep 22, 2023
cf2df4d
Added test5.html to examples/test-folder-output
rjwignar Sep 22, 2023
0910694
Merge branch 'issue-6' of https://github.com/rjwignar/txt2html into i…
rjwignar Sep 22, 2023
0a23ba5
Moved Markdown regex patterns and italics-conversion logic from proce…
rjwignar Sep 22, 2023
d5f3f5e
Removed commented-out code, added comments to process_line and proces…
rjwignar Sep 22, 2023
576b4e6
Moved italics regex patterns and logic that checks for italics into n…
rjwignar Sep 22, 2023
b7822ee
Update README.md
rjwignar Sep 22, 2023
f870641
Update README.md
rjwignar Sep 22, 2023
dbc348a
Added TO-DO # comments giving suggestions on how to implement bold co…
rjwignar Sep 23, 2023
53cfb9c
Merge branch 'issue-6' of https://github.com/rjwignar/txt2html into i…
rjwignar Sep 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@
- This is a command-line tool process input txt files output html files.
- Allow the user to specify either a file or folder of files as input

### Markdown Conversions
- This command-line tool enables the following Markdown conversions to HTML:
- Paragraphs (blank-line separated) are transformed to \<p>Paragraph Content\</p>
- Italics (\*word\* or \_word\_ to \<i>word\</i>)

### Planned features
- [x] User specified output path (version 0.1.1)
- [x] Set title from input file content (version 0.1.2)
Expand Down
18 changes: 18 additions & 0 deletions examples/test-folder-output/test3.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>This is title</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<!-- Generated content here... -->
<h1>This is title</h1><p></p>
<p></p>
<p>This is the <i>first</i> paragraph. </p>
<p></p>
<p>This is the <i>second</i> paragraph. </p>

</body>
</html>
18 changes: 18 additions & 0 deletions examples/test-folder-output/test4.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>test4</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<!-- Generated content here... -->
<p>This is the first paragraph. </p>
<p></p>
<p>This is the 2nd paragraph. </p>
<p></p>
<p>This is the 3rd paragraph. </p>

</body>
</html>
18 changes: 18 additions & 0 deletions examples/test-folder-output/test5.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>test5</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<!-- Generated content here... -->
<p>This is the **first** paragraph.</p>
<p></p>
<p>This is the __2nd__ paragraph.</p>
<p></p>
<p>This is the *3rd* paragraph.</p>

</body>
</html>
6 changes: 6 additions & 0 deletions examples/test-folder/test3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This is title


This is the *first* paragraph.

This is the _second_ paragraph.
5 changes: 5 additions & 0 deletions examples/test-folder/test4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This is the first paragraph.

This is the 2nd paragraph.

This is the 3rd paragraph.
5 changes: 5 additions & 0 deletions examples/test-folder/test5.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This is the **first** paragraph.

This is the __2nd__ paragraph.

This is the *3rd* paragraph.
85 changes: 81 additions & 4 deletions txt2html.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,62 @@
import os
import shutil
import argparse
import re

# # TO-DO #1: implement contains_bold(word)
# def contains_bold(word):
# # Define regex pattern for bold syntax (asterisk)

# # Define regex pattern for bold syntax (underscore)

# # Return true if word matches either RegEx pattern, False otherwise using re.search(regex, string)

def contains_italics(word):
# Markdown Pattern Regular Expressions

# Matches *word*, *WORD*, *woRd*, **word**
mingming-ma marked this conversation as resolved.
Show resolved Hide resolved
italic_pattern1 = r'(?<!\*)\*(?:\*|[^*]+)\*(?!\*)'

# Matches _word_, _WORD_, _woRd_, __word__
mingming-ma marked this conversation as resolved.
Show resolved Hide resolved
italic_pattern2 = r'(?<!\_)_(?:\_|[^*]+)_(?!\_)'

# Return True if word matches either RegEx pattern, False otherwise
return (re.search(italic_pattern1, word) or re.search(italic_pattern2, word))

def process_line(file_line):


# Split updatedLine into words
words = file_line.split()

# Temporary line
modifiedLine = ""
for word in words:
# This if/else structure checks if the word matches a Markdown regex pattern (italics only for now)
# If the word matches a Markdown regex it is modified with appropriate HTML tags

# Check if word matches either bold regex pattern:

# # TO-DO #3: Uncomment lines 43-44 after completing TO-DO #2
# if contains_bold(word):
# # TO-DO #2: replace wrapper **...** or __...__ with <b>...</b>
# # TO-DO #4: Change line 48 to: elif contains_italics(word):

# Check if word matches either italic regex pattern
if contains_italics(word):
# Replace beginning and ending '*' or "_" with <i>...</i> tags
# Examples:
# *word* -> <i>word</i>
# _word_ -> <i>word</i>
# _word* -> _word*
# __word__ -> <i>_word_</i> (note: this is an undesired conversion that will
# be eliminated if you check for bold syntax before checking for italics syntax)
word = '<i>' + word[1:-1] + '</i>'

# At the end, add word to modifiedLine whether it was modified or not
modifiedLine += word + ' '

return modifiedLine

def process_text_file(input_file, output_folder):
# Read the input file, the input_file has path info
Expand All @@ -21,6 +77,8 @@ def process_text_file(input_file, output_folder):
title = filename
html_title = False



# Read the first line
if len(text_lines) >= 1:
first_line = text_lines[0].strip()
Expand All @@ -34,11 +92,23 @@ def process_text_file(input_file, output_folder):

for i in range(1, len(text_lines)):
updatedLine = text_lines[i].strip()

#Check if input_file is Markdown (.md)
if (input_file.endswith(".md")):
# Process updatedLine with addition Markdown conversion logic
updatedLine = process_line(updatedLine)

bodyParagraph += "<p>" + updatedLine + "</p>\n"

if not html_title:
for l in text_lines:
updatedLine = l.strip()

#Check if input_file is Markdown (.md)
if (input_file.endswith(".md")):
# Process updatedLine with addition Markdown conversion logic
updatedLine = process_line(updatedLine)

bodyParagraph += "<p>" + updatedLine + "</p>\n"

# Generate the HTML content
Expand Down Expand Up @@ -71,12 +141,19 @@ def process_folder(input_folder, output_folder):
# Get all txt files in the input_folder, for now first depth, not recursive
txt_files = [f for f in os.listdir(input_folder) if f.endswith(".txt")]

if not txt_files:
print(f"No .txt files found in {input_folder}.")
# Get all md files in the input_folder, for now first depth, not recursive
md_files = [f for f in os.listdir(input_folder) if f.endswith(".md")]

# Combine list of txt files and list of md files into one
target_files = txt_files + md_files

# Stop program if no .txt or .md files found in input_folder
if not target_files:
print(f"No .txt or .md files found in {input_folder}.")
return

for txt_file in txt_files:
# Get the full path to the input .txt file
for txt_file in target_files:
# Get the full path to the input .txt or .md file
input_file = os.path.join(input_folder, txt_file)
process_text_file(input_file, output_folder)

Expand Down