Skip to content
Browse files

Added str-take

  • Loading branch information...
1 parent 0b458fb commit a9414ee43bfa5ec400569e6d60a3057348fec704 Sean committed
Showing with 230 additions and 79 deletions.
  1. +2 −2 README
  2. +100 −27 README.html
  3. +47 −13 src/devlinsf/str_utils.clj
  4. +81 −37 test/devlinsf/str-utils-test.clj
View
4 README
@@ -1,8 +1,8 @@
This is a proposed change for str-utils. There are a few key changes, which can be summarized as follows.
-* The re-* methods can now take a list of regular expressions, and each is applied recursively.
+* The re-* methods can now take a list of regexes, and each is applied recursively.
* Several utility methods have been added to simply common string manipulations.
-* before & after methods have been created, which simplify parsing.
+* str-before & str-after methods have been created, which simplify splitting once on a regex.
* re-split is written in terms of re-partition. The result is re-split is now lazy.
=== Installation ===
View
127 README.html
@@ -30,13 +30,17 @@
padding:5px;
margin:0px;
}
+ .ns{
+ color:green;
+ font-weight:bold;
+ }
</style>
</head>
<body>
<div id="page-content">
<h1 style="text-align:center">My Proposed changes to str-utils </h1>
<h4 style="text-align:center">Sean Devlin</h4>
- <h4 style="text-align:center">March 23, 2009</h4>
+ <h4 style="text-align:center">May 13, 2009</h4>
<p>
I've been reviewing the str-utils package, and I'd like to propose a few changes to the library. I've included the code at the bottom.
</p>
@@ -53,18 +57,18 @@ <h4 style="text-align:center">March 23, 2009</h4>
</code>
</div>
<p>
- repeatedly. The two most interesting classes are <code>java.util.regex.Pattern</code>, and <code>clojure.lang.PersistentList</code>. I deliberately decided to <b>not</b>
+ repeatedly. The two most interesting classes are <code class="ns">java.util.regex.Pattern</code>, and <code class="ns">clojure.lang.PersistentList</code>. I deliberately decided to <b>not</b>
use sequences, because I believed order was important. One method takes a map as an input, but this is so that a tuple could be passed as an options
hash.
</p>
- <h3><code>re-partion[input-string & remaining-inputs](...)</code></h3>
+ <h2><code>re-partion[input-string & remaining-inputs](...)</code></h2>
<div class="function-description">
This methods behaves like the original re-partition method, with the remaining-inputs being able to a list or a pattern. It returns a lazy sequence, and
is used as a basis for for several other methods.
</div>
- <h3><code>re-split[input-string & remaining-inputs](...)</code></h3>
+ <h2><code>re-split[input-string & remaining-inputs](...)</code></h2>
<div class="function-description">
The remaining inputs can be dispatched based on a regex pattern, a list of patterns, or a map. The regex method is the basis, and does the actual work.<br>
@@ -113,7 +117,7 @@ <h4 style="text-align:center">March 23, 2009</h4>
In my opinion, the <code>:marshal-fn</code> is best used at the end of the list. However, it could be used earlier in the list, but a exception will most likely be thrown.
</div>
- <h3><code>re-gsub[input-string & remaining-inputs](...)</code></h3>
+ <h2><code>re-gsub[input-string & remaining-inputs](...)</code></h2>
<div class="function-description">
This method can take a list or two atoms as the remaining inputs.<br>
@@ -133,7 +137,7 @@ <h4 style="text-align:center">March 23, 2009</h4>
</div>
- <h3><code>re-sub[input-string & remaining-inputs](...)</code></h3>
+ <h2><code>re-sub[input-string & remaining-inputs](...)</code></h2>
<div class="function-description">
Again, this method can take a list or two atoms as the remaining inputs.<br>
@@ -198,37 +202,103 @@ <h4 style="text-align:center">March 23, 2009</h4>
</p>
- <h2>New Parsing Helpers</h2>
+ <h2>String Seq Utils</h2>
<div class="function-description">
- I've created four methods, <code>str-before, str-before-inc, str-after, str-after-inc</code>. They are designed to help strip off parts of string before a regex.<br>
-
+ The contrib version of str-utils contains the <code>str-join</code> function. This is a string specific version of the more general <cod>interpose</code> function. It inspired the creation of
+ four other functions, <code>str-take, str-drop, str-rest & str-reverse</code>. The mimic the behavior of the regular sequence operations, with the exception that they return strings instead of
+ a sequence. Also, some of them can alternately take a regex as an input.<br>
+
+ <h2><code>str-take</code></h2>
+ <p>
+ This function is designed to be similar to the <code>take</code> function from the core. It specifically applies the <code>str</code> function to the resulting sequence. Also, it can take a regex
+ instead of an integer, and will take everything before the regex. Be careful not to combine a regex and a sequence, as this will cause an error. Finally, an optional <code>:include</code> parameter
+ can be passed to include the matched regex.
+ </p>
<div class="code-block">
<code>
<table>
<tr>
- <td>(str-before "Clojure Is Awesome" #"\s")</td>
+ <td>(str-take 7 "Clojure Is Awesome")</td>
<td>=></td>
<td>"Clojure"</td>
</tr>
<tr>
- <td>(str-before-inc "Clojure Is Awesome" #"\s")</td>
+ <td>(str-take 2 ["Clojure" "Is" "Awesome"])</td>
+ <td>=></td>
+ <td>"ClojureIs"</td>
+ </tr>
+ <tr>
+ <td>(str-take #"\s+" "Clojure Is Awesome")</td>
+ <td>=></td>
+ <td>"Clojure"</td>
+ </tr>
+ <tr>
+ <td>(str-take #"\s+" "Clojure Is Awesome" <br>&nbsp&nbsp {:include true})</td>
<td>=></td>
<td>"Clojure "</td>
</tr>
<tr>
- <td>(str-after "Clojure Is Awesome" #"\s")</td>
+ <td>(str-take #"\s+" ["Clojure" "Is" "Awesome"])</td>
+ <td>=></td>
+ <td style="color:red;"><b>error</b></td>
+ </tr>
+ </table>
+ </code>
+ </div>
+
+ <h2><code>str-drop</code></h2>
+ <p>
+ This function is designed to be similar to the <code>drop</code> function from the core. It specifically applies the <code>str</code> function to the resulting sequence. Also, it can take a regex
+ instead of an integer, and will take everything after the regex. Be careful not to combine a regex and a sequence, as this will cause an error. Finally, an optional <code>:include</code> parameter
+ can be passed to include the matched regex.
+ </p>
+
+ <div class="code-block">
+ <code>
+ <table>
+ <tr>
+ <td>(str-drop 8 "Clojure Is Awesome")</td>
+ <td>=></td>
+ <td>"Is Awesome"</td>
+ </tr>
+ <tr>
+ <td>(str-drop 1 ["Clojure" "Is" "Awesome"])</td>
+ <td>=></td>
+ <td>"IsAwesome"</td>
+ </tr>
+ <tr>
+ <td>(str-drop #"\s+" "Clojure Is Awesome")</td>
<td>=></td>
<td>"Is Awesome"</td>
</tr>
<tr>
- <td>(str-after-inc "Clojure Is Awesome" #"\s")</td>
+ <td>(str-drop #"\s+" "Clojure Is Awesome" <br>&nbsp&nbsp {:include true})</td>
<td>=></td>
<td>" Is Awesome"</td>
+ </tr>
+ <tr>
+ <td>(str-drop #"\s+" ["Clojure" "Is" "Awesome"])</td>
+ <td>=></td>
+ <td style="color:red;"><b>error</b></td>
</tr>
</table>
</code>
</div>
-
+
+ <h2><code>str-rest</code></h2>
+ <p>This function applies <code>str</code> to the <code>rest</code> of the input. It is equivalent to <code>(str-drop 1 <i>input</i>)</code></p>
+ <div class="code-block">
+ <code>
+ <table>
+ <tr>
+ <td>(str-rest (str :Clojure))</td>
+ <td>=></td>
+ <td>"Clojure"</td>
+ </tr>
+ </table>
+ </code>
+ </div>
+
<!--
<table>
<tr>
@@ -238,12 +308,24 @@ <h4 style="text-align:center">March 23, 2009</h4>
</tr>
</table> -->
- These methods can be used to help parse strings<br>
+ <h2><code>str-reverse</code></h2>
+ <div class="brief-function-description">
+ This methods reverses a string<br>
+ <div class="code-block">
+ <code>
+ (str-reverse "Clojure") => "erujolC"
+ </code>
+ </div>
+ </div>
+
+ <h3>An Example</h3>
+ These methods can be used to help parse strings, such as below.<br>
<div class="code-block">
<code>
- (str-before (str-after "&lt h4 ... &gt" #"&lth4") "&gt") <br>=> ;the stuff in the middle
+ (str-take "&gt" (str-drop #"&lt h4" "&lt h4 ... &gt")) <br>=> ;the stuff in the middle
</code>
</div>
+
</div>
<h2>
@@ -251,15 +333,6 @@ <h4 style="text-align:center">March 23, 2009</h4>
</h2>
I've added a few inflectors that I am familiar with from Rails. My apologies if their origin is anther language. I'd be interested in knowing where the method originated
- <h4>str-reverse</h4>
- <div class="brief-function-description">
- This methods reverses a string<br>
- <div class="code-block">
- <code>
- (str-reverse "Clojure") => "erujolC"
- </code>
- </div>
- </div>
<h4>trim</h4>
<div class="brief-function-description">
@@ -409,7 +482,7 @@ <h4 style="text-align:center">March 23, 2009</h4>
<tr>
<td>(singularize "beaches")</td>
<td>=></td>
- <td>("beach")</td>
+ <td>"beach"</td>
</tr>
<tr>
<td>(singularize "babies")</td>
@@ -428,7 +501,7 @@ <h4 style="text-align:center">March 23, 2009</h4>
<h2>Closing thoughts</h2>
<p>
- There are three more methods, str-join, chop, and chomp that were already in str-utils. I change the implementation of the methods, but the behavior should be the same.
+ There are three more methods, <code>str-join, chop, & chomp</code> that were already in str-utils. I changed the implementation of the methods, but the behavior should be the same.
</p>
<p>
There is a big catch with my proposed change. The signature of re-split, re-partition, re-gsub and re-sub changes. They will not be backwards compatible, and will break code. However, I think the flexibility is worth it.
View
60 src/devlinsf/str_utils.clj
@@ -4,11 +4,6 @@
;;; String Merging & Slicing
-(defn str-join
- "Returns a string of all elements in 'sequence', separated by
- 'separator'. Like Perl's 'join'."
- [separator sequence]
- (apply str (interpose separator sequence)))
(defmulti re-partition (fn[input-string & remaining-inputs] (class (first remaining-inputs))))
@@ -131,6 +126,40 @@
(re-sub (re-sub input-string (reverse remaining)) (first pair) (second pair)))))
;;; Parsing Helpers
+(defmulti str-take (fn[parameter & remaining] (class parameter)))
+
+(defmethod str-take java.util.regex.Pattern
+ ([parameter input-string]
+ (str-take parameter input-string {}))
+ ([parameter input-string options-map]
+ (let [matches (re-partition input-string parameter)]
+ (if (options-map :include)
+ (apply str (take 2 matches))
+ (first matches)))))
+
+(defmethod str-take :default
+ [parameter input-string]
+ (apply str (take parameter input-string)))
+
+(defn str-rest
+ [#^String input-string]
+ (apply str (rest input-string)))
+
+(defmulti str-drop (fn[parameter & remaining] (class parameter)))
+
+(defmethod str-drop java.util.regex.Pattern
+ ([parameter input-string]
+ (str-drop parameter input-string {}))
+ ([parameter input-string options-map]
+ (let [matches (re-partition input-string parameter)]
+ (if (options-map :include)
+ (apply str (rest matches))
+ (apply str (drop 2 matches))))))
+
+(defmethod str-drop :default
+ [parameter input-string]
+ (apply str (drop parameter input-string)))
+
(defn str-before [#^String input-string #^java.util.regex.Pattern regex]
(let [matches (re-partition input-string regex)]
(first matches)))
@@ -147,14 +176,19 @@
(let [matches (re-partition input-string regex)]
(apply str (rest matches))))
-
-;;; Inflectors
-;;; These methods only take the input string.
(defn str-reverse
"This method excepts a string and returns the reversed string as a results"
[#^String input-string]
(apply str (reverse input-string)))
-
+
+(defn str-join
+ "Returns a string of all elements in 'sequence', separated by
+ 'separator'. Like Perl's 'join'."
+ [separator sequence]
+ (apply str (interpose separator sequence)))
+
+;;; Inflectors
+;;; These methods only take the input string.
(defn upcase
"Converts the entire string to upper case"
[#^String input-string]
@@ -178,12 +212,12 @@
(defn ltrim
"This method chops all of the leading whitespace."
[#^String input-string]
- (str-after input-string #"\s+"))
+ (str-drop #"\s+" input-string))
(defn rtrim
"This method chops all of the trailing whitespace."
[#^String input-string]
- (str-reverse (str-after (str-reverse input-string) #"\s+")))
+ (str-reverse (str-drop #"\s+" (str-reverse input-string))))
(defn chop
"Removes the last character of string."
@@ -194,14 +228,14 @@
"Removes all trailing newline \\n or return \\r characters from
string. Note: String.trim() is similar and faster."
[#^String input-string]
- (str-before input-string #"[\r\n]+"))
+ (str-take #"[\r\n]+" input-string))
(defn capitalize
"This method turns a string into a capitalized version, Xxxx"
[#^String input-string]
(str-join "" (list
(upcase (str (first input-string)))
- (downcase (apply str (rest input-string))))))
+ (downcase (str-rest input-string)))))
(defn titleize
"This method takes an input string, splits it across whitespace, dashes, and underscores. Each word is capitalized, and the result is joined with \" \"."
View
118 test/devlinsf/str-utils-test.clj
@@ -35,40 +35,48 @@
(is (= (capitalize "clojure") "Clojure")))
(deftest test-titleize
- (let [expected-string "Clojure Is Awesome"]
- (is (= (titleize "clojure is awesome") expected-string))
- (is (= (titleize "clojure is awesome") expected-string))
- (is (= (titleize "CLOJURE IS AWESOME") expected-string))
- (is (= (titleize "clojure-is-awesome") expected-string))
- (is (= (titleize "clojure- _ is---awesome") expected-string))
- (is (= (titleize "clojure_is_awesome") expected-string))))
+ (are
+ (let [expected-string "Clojure Is Awesome"]
+ (= (titleize _1) expected-string))
+ "clojure is awesome"
+ "clojure is awesome"
+ "CLOJURE IS AWESOME"
+ "clojure-is-awesome"
+ "clojure- _ is---awesome"
+ "clojure_is_awesome"))
(deftest test-camelize
- (let [expected-string "clojureIsAwesome"]
- (is (= (camelize "clojure is awesome") expected-string))
- (is (= (camelize "clojure is awesome") expected-string))
- (is (= (camelize "CLOJURE IS AWESOME") expected-string))
- (is (= (camelize "clojure-is-awesome") expected-string))
- (is (= (camelize "clojure- _ is---awesome") expected-string))
- (is (= (camelize "clojure_is_awesome") expected-string))))
+ (are
+ (let [expected-string "clojureIsAwesome"]
+ (= (camelize _1) expected-string))
+ "clojure is awesome"
+ "clojure is awesome"
+ "CLOJURE IS AWESOME"
+ "clojure-is-awesome"
+ "clojure- _ is---awesome"
+ "clojure_is_awesome"))
(deftest test-underscore
- (let [expected-string "clojure_is_awesome"]
- (is (= (underscore "clojure is awesome") expected-string))
- (is (= (underscore "clojure is awesome") expected-string))
- (is (= (underscore "CLOJURE IS AWESOME") expected-string))
- (is (= (underscore "clojure-is-awesome") expected-string))
- (is (= (underscore "clojure- _ is---awesome") expected-string))
- (is (= (underscore "clojure_is_awesome") expected-string))))
+ (are
+ (let [expected-string "clojure_is_awesome"]
+ (= (underscore _1) expected-string))
+ "clojure is awesome"
+ "clojure is awesome"
+ "CLOJURE IS AWESOME"
+ "clojure-is-awesome"
+ "clojure- _ is---awesome"
+ "clojure_is_awesome"))
(deftest test-dasherize
- (let [expected-string "clojure-is-awesome"]
- (is (= (dasherize "clojure is awesome") expected-string))
- (is (= (dasherize "clojure is awesome") expected-string))
- (is (= (dasherize "CLOJURE IS AWESOME") expected-string))
- (is (= (dasherize "clojure-is-awesome") expected-string))
- (is (= (dasherize "clojure- _ is---awesome") expected-string))
- (is (= (dasherize "clojure_is_awesome") expected-string))))
+ (are
+ (let [expected-string "clojure-is-awesome"]
+ (= (dasherize _1) expected-string))
+ "clojure is awesome"
+ "clojure is awesome"
+ "CLOJURE IS AWESOME"
+ "clojure-is-awesome"
+ "clojure- _ is---awesome"
+ "clojure_is_awesome"))
(deftest test-str-before
(is (= (str-before "Clojure Is Awesome" #"Is") "Clojure ")))
@@ -135,7 +143,7 @@
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-gsub source-string #"\s+" " ") "1 2 3 4 5 6"))
(is (= (re-gsub source-string '((#"\s+" " "))) "1 2 3 4 5 6"))
- (is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D D D D")))
+ (is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D D D D"))))
(deftest test-re-sub
(let [source-string "1 2 3 4 5 6"]
@@ -162,11 +170,47 @@
"stop" "stops"))
;;;The code for the test-pluralize function was based on functions contributed by Brian Doyle
-(deftest test-pluralize (are (= _1 (pluralize _2))
- "foos" "foo"
- "beaches" "beach"
- "babies" "baby"
- "boxes" "box"
- "bushes" "bush"
- "buses" "bus"
- "stops" "stop"))
+(deftest test-pluralize
+ (are (= _1 (pluralize _2))
+ "foos" "foo"
+ "beaches" "beach"
+ "babies" "baby"
+ "boxes" "box"
+ "bushes" "bush"
+ "buses" "bus"
+ "stops" "stop"))
+
+(deftest test-str-rest
+ (are (= _1 (str-rest _2))
+ "beer" (str :beer)
+ "eer" "Beer"
+ "" "B"
+ "" ""
+ "" '()))
+
+(deftest test-str-take
+ (let [source-string "Be er"]
+ (are
+ (= (str-take _1 source-string) _2)
+ 2 "Be"
+ 10 "Be er"
+ #"\r" "Be er"
+ #"\s+" "Be")
+ (is (= (str-take #"\s+" source-string {:include true}) "Be "))
+ (is (= (str-take #"\s+" source-string {:include false}) "Be"))
+ (is (= (str-take 2 ["B" "e" "e" "r"]) "Be"))
+ (is (= (str-take 2 ["B" "ee" "r"]) "Bee"))
+ (is (= (str-take 2 []) ""))))
+
+(deftest test-str-drop
+ (let [source-string "Be er"]
+ (are (= (str-drop _1 source-string) _2)
+ 2 " er"
+ 10 ""
+ #"\r" ""
+ #"\s+" "er")
+ (is (= (str-drop #"\s+" source-string {:include true}) " er"))
+ (is (= (str-drop #"\s+" source-string {:include false}) "er"))
+ (is (= (str-drop 2 ["B" "e" "e" "r"]) "er"))
+ (is (= (str-drop 2 ["B" "ee" "r"]) "r"))
+ (is (= (str-drop 2 []) ""))))

0 comments on commit a9414ee

Please sign in to comment.
Something went wrong with that request. Please try again.