# Strings

At first it seems that programming languages are only useful to only crunch numbers, but often times we realize that we need to manipulating strings. Strings are ubiquitous, they are appear in sample names, locations, table headers, DNA sequences, and even for printing short strings to debug your own code.

## Define strings

In [12]:
# Apostrophes
soil_1 = 'Clay loam'
print(type(soil_1))

# Quotation marks
soil_2 = "Silty loam"
print(type(soil_2))

# Triple apostrophes or triple quotes can define a string spanning multiple lines
soil_definition = """The layer(s) of generally loose mineral and/or organic material 
that are affected by physical, chemical, and/or biological processes
at or near the planetary surface and usually hold liquids, gases, 
and biota and support plants."""

print(type(soil_definition))
print(soil_definition)

<class 'str'>
<class 'str'>
<class 'str'>
The layer(s) of generally loose mineral and/or organic material 
that are affected by physical, chemical, and/or biological processes
at or near the planetary surface and usually hold liquids, gases, 
and biota and support plants.


## Split text lines into a list

In [19]:
# Split text lines into a list

str_lines = soil_definition.splitlines(); # Split lines at the carriage return
print(str_lines)

['The layer(s) of generally loose mineral and/or organic material ', 'that are affected by physical, chemical, and/or biological processes', 'at or near the planetary surface and usually hold liquids, gases, ', 'and biota and support plants.']


## Split string on a given character

In [21]:
str_commas = soil_definition.split(',')
print(str_commas)
print(str_commas[4]) # Now we can access individual words


['The layer(s) of generally loose mineral and/or organic material \nthat are affected by physical', ' chemical', ' and/or biological processes\nat or near the planetary surface and usually hold liquids', ' gases', ' \nand biota and support plants.']
 
and biota and support plants.


In [22]:
# Split a single string. This method is useful when we encode information in file names and URL links.
data = 'lat_36.7_lon_-97.5_elev_345_meters'.split('_')
print(data)
print(data[1])
lat = float(data[1]) # Convert string into float and store it in a variable called 'lat'


['lat', '36.7', 'lon', '-97.5', 'elev', '345', 'meters']
36.7


## Replace characters

In [24]:
# Use the replace function to add or strip a character (or sequence of characters) from a string
print('file_name'.replace('_', ''))      # Remove underscore
print('file_name'.replace('_', '_plot_')) # Replace underscore with 'hello'

filename
file_plot_name


## Join strings

In [6]:
# Concatenate strigs in list using custom delimiter

# Useful when building file names and URLs. You can pass a tuple or a list.
texture_list = ["Silty", "clay", "loam"]
print(" ".join(texture_list))
print("-".join(texture_list))


Silty clay loam
Silty-clay-loam


In [5]:
# Concatenate strigs in tuple using custom delimiter

texture_tuple = ("Silty", "clay", "loam")
print(" ".join(texture_tuple))

Silty clay loam


In [10]:
# Concatenate strings using the '+' sign

filename = "myfile"
extension = ".csv"
path = "/User/Documents/Datasets/"
fullpath = path + filename + extension
print(fullpath)


/User/Documents/Datasets/myfile.csv


In [30]:
# Merge strings and numbers

A = 2
print('A = ' + str(A)) # Need to convert the integer to a string using the str() function

A = 2


## Formatting strings

In [18]:
# Check whether ALL the characters in the string are numbers 

str.isnumeric('20')   # Returns True


True

In [19]:
# Pperiods are not numbers!
str.isnumeric('2.0')  # Returns False

False

In [23]:
# Traditional way of formatting strings. Also known as %-formatting
print("The sky is %s as the ocean" % "blue") # Still works, but might be deprecated in the future
print("The sky is %s as the ocean" , "blue") # This won't work

# New f-string formatting
print("The sky is {} as the ocean".format("blue")) # Variables must follow the order of the brackets
print("The sky is {color} as the ocean".format(color="blue")) # Recommended


The sky is blue as the ocean
The sky is %s as the ocean blue
The sky is blue as the ocean
The sky is blue as the ocean


In [24]:
# Print in new line
print("\nJan\nFeb\nMar") # \n represents a new line


Jan
Feb
Mar


In [7]:
# Escaping using the backslash since inches are represented by "
print("The height is 6' 4\"") 

The height is 6' 4"


In [8]:
diameter = 1
circle_area = 3.14
print('The area of a circle with a diameter of {diameter} cm has an area of {circle_area} cm'
      .format(diameter=diameter,circle_area=circle_area))


The area of a circle with a diameter of 1 cm has an area of 3.14 cm


In [16]:
# Write a label for a plot using %-formatting

# % Denotes the beginning of the string format
# f stands for float
parvalue = [0.3,0.1,120] # Three parameter values, typically obtained by curve fitting
label = 'fit: a=%5.3f, b=%5.3f, c=%3.0f' % tuple(parvalue)
print(label)

# 5 is the field width (columns including the dot), and 3 is the number of decimal places

print('DOY: %03.0f' % tuple([1]))
print('DOY: %03.0f' % tuple([12]))
print('DOY: %03.0f' % tuple([365]))

# Example for MODIS URL request date
print('DOY: A%03.0f' % tuple([1]))
print('DOY: A%03.0f' % tuple([365]))

# An alternative to fill with leading zeros
print('34'.zfill(3))


fit: a=0.300, b=0.100, c=120
DOY: 001
DOY: 012
DOY: 365
DOY: A001
DOY: A365
034


## Comparing strings

In [28]:
# Compare strings
print('April' == 'May')
print('April' == 'april') # Case matters

print('april'.capitalize())
print('april'.upper())

False
False
April
APRIL


## String sequences

In [30]:
# Count individual characters
total_p = soil_definition.count('p') # Count number of 'p' characters in apple
print(total_p)

# Count sequence of characters
soil_definition.count('s') # You can also use a string, without declaring a new variable.

# Find if word starts with one of the following sequences
print(soil_definition.startswith(('app','ora'))) # Note that the input is a tuple: ('app','ora')

# Find if word ends with one of the following sequences
print(soil_definition.endswith(('ple','nge')))   # Note that the input is a tuple ('ple','nge')

6
False
False
