In this notebook, we will see how to use NumPy's String functions. All these methods return copies of the original array. They do not modify the original array.

In [1]:
import numpy as np

Let's start by defining an array which we will use as our example.

In [2]:
states_list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas',
               'California', 'Colorado', 'Connecticut', 'Delaware',
               'Florida', 'Georgia', 'Hawaii', 'Idaho',
               'Illinois', 'Indiana', 'Iowa', 'Kansas',
               'Kentucky', 'Louisiana', 'Maine', 'Maryland',
               'Massachusetts', 'Michigan', 'Minnesota',
               'Mississippi', 'Missouri', 'Montana', 'Nebraska',
               'Nevada', 'New Hampshire', 'New Jersey',
               'New Mexico', 'New York', 'North Carolina',
               'North Dakota', 'Ohio', 'Oklahoma',
               'Oregon', 'Pennsylvania', 'Rhode Island',
               'South Carolina', 'South Dakota', 'Tennessee',
               'Texas', 'Utah', 'Vermont', 'Virginia',
               'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
states_array = np.array(states_list).reshape(10, -1)
states_array

array([['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California'],
       ['Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia'],
       ['Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa'],
       ['Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland'],
       ['Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri'],
       ['Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey'],
       ['New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio'],
       ['Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island',
        'South Carolina'],
       ['South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont'],
       ['Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']], 
      dtype='|S14')

In [3]:
states_array.shape

(10L, 5L)

Our test array, thus, contains 50 elements organized in 10 rows with 5 columns in each row.
<br><br>
Let's start by converting all elements to lowercase. This is easily accomplished by calling the lower(a) method.

In [4]:
np.char.lower(states_array)

array([['alabama', 'alaska', 'arizona', 'arkansas', 'california'],
       ['colorado', 'connecticut', 'delaware', 'florida', 'georgia'],
       ['hawaii', 'idaho', 'illinois', 'indiana', 'iowa'],
       ['kansas', 'kentucky', 'louisiana', 'maine', 'maryland'],
       ['massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri'],
       ['montana', 'nebraska', 'nevada', 'new hampshire', 'new jersey'],
       ['new mexico', 'new york', 'north carolina', 'north dakota', 'ohio'],
       ['oklahoma', 'oregon', 'pennsylvania', 'rhode island',
        'south carolina'],
       ['south dakota', 'tennessee', 'texas', 'utah', 'vermont'],
       ['virginia', 'washington', 'west virginia', 'wisconsin', 'wyoming']], 
      dtype='|S14')

Similarly, all elements can be converted to uppercase by calling the upper(a) method.

In [5]:
np.char.upper(states_array)

array([['ALABAMA', 'ALASKA', 'ARIZONA', 'ARKANSAS', 'CALIFORNIA'],
       ['COLORADO', 'CONNECTICUT', 'DELAWARE', 'FLORIDA', 'GEORGIA'],
       ['HAWAII', 'IDAHO', 'ILLINOIS', 'INDIANA', 'IOWA'],
       ['KANSAS', 'KENTUCKY', 'LOUISIANA', 'MAINE', 'MARYLAND'],
       ['MASSACHUSETTS', 'MICHIGAN', 'MINNESOTA', 'MISSISSIPPI', 'MISSOURI'],
       ['MONTANA', 'NEBRASKA', 'NEVADA', 'NEW HAMPSHIRE', 'NEW JERSEY'],
       ['NEW MEXICO', 'NEW YORK', 'NORTH CAROLINA', 'NORTH DAKOTA', 'OHIO'],
       ['OKLAHOMA', 'OREGON', 'PENNSYLVANIA', 'RHODE ISLAND',
        'SOUTH CAROLINA'],
       ['SOUTH DAKOTA', 'TENNESSEE', 'TEXAS', 'UTAH', 'VERMONT'],
       ['VIRGINIA', 'WASHINGTON', 'WEST VIRGINIA', 'WISCONSIN', 'WYOMING']], 
      dtype='|S14')

Call swapcase(a) to return a new array with uppercase characters converted to lowercase and vice verse for each element of the original array.

In [6]:
np.char.swapcase(states_array)

array([['aLABAMA', 'aLASKA', 'aRIZONA', 'aRKANSAS', 'cALIFORNIA'],
       ['cOLORADO', 'cONNECTICUT', 'dELAWARE', 'fLORIDA', 'gEORGIA'],
       ['hAWAII', 'iDAHO', 'iLLINOIS', 'iNDIANA', 'iOWA'],
       ['kANSAS', 'kENTUCKY', 'lOUISIANA', 'mAINE', 'mARYLAND'],
       ['mASSACHUSETTS', 'mICHIGAN', 'mINNESOTA', 'mISSISSIPPI', 'mISSOURI'],
       ['mONTANA', 'nEBRASKA', 'nEVADA', 'nEW hAMPSHIRE', 'nEW jERSEY'],
       ['nEW mEXICO', 'nEW yORK', 'nORTH cAROLINA', 'nORTH dAKOTA', 'oHIO'],
       ['oKLAHOMA', 'oREGON', 'pENNSYLVANIA', 'rHODE iSLAND',
        'sOUTH cAROLINA'],
       ['sOUTH dAKOTA', 'tENNESSEE', 'tEXAS', 'uTAH', 'vERMONT'],
       ['vIRGINIA', 'wASHINGTON', 'wEST vIRGINIA', 'wISCONSIN', 'wYOMING']], 
      dtype='|S14')

The capitalize(a) method returns a new array with only the first character of each element of the original array capitalized.

In [7]:
np.char.capitalize(states_array)

array([['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California'],
       ['Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia'],
       ['Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa'],
       ['Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland'],
       ['Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri'],
       ['Montana', 'Nebraska', 'Nevada', 'New hampshire', 'New jersey'],
       ['New mexico', 'New york', 'North carolina', 'North dakota', 'Ohio'],
       ['Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode island',
        'South carolina'],
       ['South dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont'],
       ['Virginia', 'Washington', 'West virginia', 'Wisconsin', 'Wyoming']], 
      dtype='|S14')

To return fixed-width strings with each element center-aligned, call the center(a, *width, fillchar=' '*) method. Here,
- *a* is the array,
- *width* is the fixed-width, and
- *fillchar* is the padding character; default is a *whitespace*

For left-justified or right-justfied elements, the methods are *ljust(a, width, fillchar=' ')* and *rjust(a, width, fillchar=' ')*, respectively.

In [8]:
np.char.center(states_array, 15, '*')

array([['****Alabama****', '*****Alaska****', '****Arizona****',
        '****Arkansas***', '***California**'],
       ['****Colorado***', '**Connecticut**', '****Delaware***',
        '****Florida****', '****Georgia****'],
       ['*****Hawaii****', '*****Idaho*****', '****Illinois***',
        '****Indiana****', '******Iowa*****'],
       ['*****Kansas****', '****Kentucky***', '***Louisiana***',
        '*****Maine*****', '****Maryland***'],
       ['*Massachusetts*', '****Michigan***', '***Minnesota***',
        '**Mississippi**', '****Missouri***'],
       ['****Montana****', '****Nebraska***', '*****Nevada****',
        '*New Hampshire*', '***New Jersey**'],
       ['***New Mexico**', '****New York***', '*North Carolina',
        '**North Dakota*', '******Ohio*****'],
       ['****Oklahoma***', '*****Oregon****', '**Pennsylvania*',
        '**Rhode Island*', '*South Carolina'],
       ['**South Dakota*', '***Tennessee***', '*****Texas*****',
        '******Utah*****', '****Vermont

To replace all occurrences of a substring with a new substring, call the *replace(a, old, new)* method.

In [9]:
np.char.replace(states_array, ' ', '_')

array([['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California'],
       ['Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia'],
       ['Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa'],
       ['Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland'],
       ['Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri'],
       ['Montana', 'Nebraska', 'Nevada', 'New_Hampshire', 'New_Jersey'],
       ['New_Mexico', 'New_York', 'North_Carolina', 'North_Dakota', 'Ohio'],
       ['Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode_Island',
        'South_Carolina'],
       ['South_Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont'],
       ['Virginia', 'Washington', 'West_Virginia', 'Wisconsin', 'Wyoming']], 
      dtype='|S14')

To return all elements that start with a prefix, call the *startswith(a, prefix)* method.
<br><br>
For example, to find all elements that begin with *"New"*, use:

In [10]:
states_array[np.char.startswith(states_array, 'New')]

array(['New Hampshire', 'New Jersey', 'New Mexico', 'New York'], 
      dtype='|S14')

Let's find the elements that aren't completely alphabetic. For our sample array, we expect to get the elements that contain a whitespace.

In [11]:
states_array[~np.char.isalpha(states_array)]

array(['New Hampshire', 'New Jersey', 'New Mexico', 'New York',
       'North Carolina', 'North Dakota', 'Rhode Island', 'South Carolina',
       'South Dakota', 'West Virginia'], 
      dtype='|S14')

To find the occurrence of a substring, we can use the *find(array, sub)* method. This will return the lowest index in the string where the substring *sub* is found; will return -1 otherwise.<br>
For example, let's find elements in our sample array which contain the word *Carolina.*

In [12]:
states_array[np.char.find(states_array, 'Carolina') > -1]

array(['North Carolina', 'South Carolina'], 
      dtype='|S14')