<div id="BBox" class="alert alert-info" style="font-family:courier;color:black;justify-content:left;">
<h1> String Operations </h1>
NumPy’s string operations provide a comprehensive set of functions specifically designed to manipulate and process string data within arrays, making it easier to handle text-based data efficiently. These functions allow users to perform operations like string concatenation, case conversion, finding and replacing substrings, checking for substrings, splitting strings, and trimming whitespace, all directly on NumPy arrays. The string functions are vectorized, meaning they operate on each element of an array without the need for looping, which improves performance significantly compared to standard Python string manipulation on large datasets. This capability is especially useful in <u>data preprocessing and cleaning, where handling text data efficiently is crucial, such as in Natural Language Processing (NLP) tasks and data wrangling workflows </u>. By leveraging NumPy’s string operations, users can achieve consistency and speed in managing text data, facilitating smoother transitions between different stages of <u>data processing in a machine learning pipeline or data analysis project</u>.
<ul>
<li><strong>np.char.add : </strong> Concatenates (joins) two strings element-wise. It’s useful for combining values in different string arrays or adding prefixes/suffixes.</li>
<li><strong>np.char.multiply :</strong> Repeats each string in an array a specified number of times.</li>
<li><strong>np.char.capitalize : </strong>Capitalizes the first letter of each string element in an array.</li>
<li><strong>np.char.title : </strong>Converts each string to title case, capitalizing the first letter of each word.</li>
<li><strong>np.char.lower : </strong>Converts all characters in each string element to lowercase.</li>
<li><strong>np.char.upper : </strong>Converts all characters in each string element to uppercase.</li>
<li><strong>np.char.split : </strong>Splits each string element into a list of words based on a specified delimiter (defaults to whitespace).</li>
<li><strong>np.char.strip : </strong>Removes leading and trailing whitespace from each string element.</li>
<li><strong>np.char.replace : </strong> Replaces occurrences of a specified substring with another substring within each element of the array.
</li>
<li><strong>np.char.find : </strong> Searches for the first occurrence of a specified substring within each string element and returns the index. If not found, returns `-1`.
</li>
<li><strong>np.char.count : </strong> Counts occurrences of a specified substring within each string element.</li>
<li><strong>np.char.join : </strong> Joins the characters of each string in an array using a specified separator.</li>
<li><strong>np.char.endswith : </strong> Checks if each string element in an array ends with a specified suffix, returning `True` or `False`.
</li>
<li><strong>np.char.startswith : </strong> Checks if each string element in an array starts with a specified prefix.</li>
<li><strong>np.char.equal : </strong> Compares each string element in two arrays and returns `True` where elements are equal, and `False` otherwise.
</li>
</ul>

</div>

In [2]:
import numpy as np

In [None]:
array1 = ['Hello ', 'Data ']
array2 = ['World', 'Science']
result = np.char.add( array1, array2)

print(result)

['Hello World' 'Data Science']


In [5]:
array = ['hello', 'world']
result = np.char.multiply(array, 3)
print(result)

['hellohellohello' 'worldworldworld']


In [9]:
result = np.char.capitalize(['hello world', 'data science'])
print(result)

['Hello world' 'Data science']


In [10]:
result = np.char.title(['hello world', 'data science'])
print(result)

['Hello World' 'Data Science']


In [11]:
result = np.char.lower(['HELLO', 'WORLD'])
print(result)

['hello' 'world']


In [12]:
result = np.char.upper(['hello', 'world'])
print(result)

['HELLO' 'WORLD']


In [22]:
array = ['hello world', 'numpy array']
print(array[0])
print(array[1])
result = np.char.split(array)
print(result[0])
print(result[1])


hello world
numpy array
['hello', 'world']
['numpy', 'array']


In [23]:
result = np.char.strip([' hello ', ' world '])
print(result)

['hello' 'world']


In [24]:
result = np.char.replace(['hello world', 'data science'], ' ', '_')
print(result)

['hello_world' 'data_science']


In [25]:
result = np.char.find(['hello', 'world'], 'o')
print(result)

[4 1]


In [26]:
result = np.char.count(['hello', 'world'], 'l')
print(result)

[2 1]


In [29]:
result = np.char.join('-', ['hello', 'world'])
print(result)

['h-e-l-l-o' 'w-o-r-l-d']


In [30]:
result = np.char.endswith(['hello', 'world'], 'd')
print(result)

[False  True]


In [31]:
result = np.char.startswith(['hello', 'world'], 'h')
print(result)

[ True False]


In [33]:
result = np.char.equal(['hello', 'world'], ['Hello', 'world'])
print(result)

[False  True]
