Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

696 lines (464 sloc) 14.704 kb
0:00:00.000,0:00:04.533
Hi, and welcome back to the channel of Perl Tutorial
0:00:04.533,0:00:06.599
This time I'm going to talk about
0:00:06.599,0:00:08.733
a number of subroutines
0:00:08.733,0:00:12.800
functions, that can handle strings in perl
0:00:12.800,0:00:14.768
There are couple of basic ones like...
0:00:14.768,0:00:16.463
Let's switch to the other window
0:00:16.463,0:00:20.932
So, there are couple of basic functions, for example
0:00:20.932,0:00:24.056
lc, that will return a lowercase version of a string
0:00:24.056,0:00:26.423
uc, that will return uppercase version
0:00:26.423,0:00:30.324
and length that will return a number of characters in that string
0:00:30.324,0:00:34.643
So, I wrote a couple example files
0:00:34.643,0:00:38.000
What you can see here is $str, which is a string
0:00:38.000,0:00:40.936
HeLlo with a capital L here and lowercase l
0:00:40.936,0:00:43.533
And if you run the script
0:00:43.533,0:00:46.000
Lets run it, so you can see, how it acually works
0:00:46.000,0:00:50.712
You see, that the first will return "hello" in lowercase
0:00:50.712,0:00:53.267
The second one in all uppercase
0:00:53.267,0:00:57.213
and the last one just returns the number of characters
0:00:57.213,0:01:02.206
Neither of these acually change the string
0:01:02.206,0:01:05.781
if you want to change the string, than you type in:
0:01:05.781,0:01:10.159
$str = lc $str;
0:01:10.174,0:01:14.490
and than you print out the $str again
0:01:14.490,0:01:18.420
If I run this now than you will see that
0:01:18.420,0:01:22.274
now $str is already lowercase
0:01:22.274,0:01:25.405
These are really the simple ones
0:01:25.405,0:01:27.799
There is one slightly more complex function
0:01:27.799,0:01:30.230
which is called index
0:01:30.230,0:01:32.856
Index will get normally two parameters
0:01:32.856,0:01:36.186
one of them is string like this one
0:01:36.186,0:01:38.686
Let's enlarge the font
0:01:38.686,0:01:43.371
the first parameter is a string
0:01:43.371,0:01:45.102
and that's what you can see here
0:01:45.102,0:01:47.773
and the second one is also a string
0:01:47.773,0:01:50.017
that is usually shorter
0:01:50.017,0:01:55.867
because you're looking for it's location within the first parameter
0:01:55.867,0:01:59.814
so, if you run index on this longer string, giving it "cat"
0:01:59.814,0:02:01.765
it will return you 10
0:02:01.765,0:02:06.269
because "cat" starts from the 10. character
0:02:06.269,0:02:11.425
Index counts characters starting with 0
0:02:11.425,0:02:13.561
so this capital T is actually 0
0:02:13.561,0:02:17.651
that's how it comes to 10 at "c"
0:02:17.651,0:02:22.266
If I run this same index function, now giving it "dog"
0:02:22.266,0:02:24.133
than it will return -1
0:02:24.133,0:02:26.667
because in this string there is only a cat
0:02:26.667,0:02:28.685
there is no "dog"
0:02:28.685,0:02:33.605
The way it returns error or faliture is giving us -1
0:02:33.605,0:02:37.942
If you call this function giving a string
0:02:37.942,0:02:42.667
that doesn't appear in the original string you'll get -1
0:02:42.667,0:02:45.419
It might be not the most convenient way
0:02:45.419,0:02:48.723
but that's what we have to use
0:02:48.723,0:02:55.199
If I run index on the string "The", capital T
0:02:55.199,0:02:56.932
it will return 0
0:02:56.932,0:03:01.599
That's just showing, that the index is 0 based
0:03:01.599,0:03:04.733
So it will find this "The" at the beginning
0:03:04.733,0:03:07.709
and it will return 0, because that's how it starts
0:03:07.709,0:03:12.000
Next call, this one, returns 26
0:03:12.000,0:03:16.733
because index is looking for the exact, same string
0:03:16.733,0:03:19.266
and it's case sensitive as well
0:03:19.266,0:03:23.012
so first "The" doesn't fit, because that's a capital T
0:03:23.012,0:03:26.633
so it will look for another "the" and there it will find it
0:03:26.633,0:03:30.953
Now, you might think index is looking for the word
0:03:30.953,0:03:34.236
but it's not the case, it just looks
0:03:34.236,0:03:38.943
for a substring, which is just characters one after another
0:03:38.943,0:03:41.266
In the next example you can see
0:03:41.266,0:03:44.266
that we are looking for the string
0:03:44.266,0:03:47.599
letter "e" and the space afterwards
0:03:47.599,0:03:49.332
and it will return us 2
0:03:49.332,0:03:52.067
because here you can see
0:03:52.067,0:03:55.707
highlighted the place where it appears
0:03:55.707,0:04:00.395
but as you can see there is a "e " here
0:04:00.395,0:04:05.457
the way you can reach that is index can get third parameter
0:04:05.457,0:04:11.401
that's here the number 3, but can be any number
0:04:11.401,0:04:16.867
and it will start searching from the third position
0:04:16.867,0:04:19.762
which is acually 4th character
0:04:19.762,0:04:22.867
beacuse you start counting form 0
0:04:22.867,0:04:25.427
so the third place, place number three
0:04:25.427,0:04:29.420
and therefore it will find the "the" here
0:04:29.420,0:04:36.841
at this spot, which is appearentely the 28. place
0:04:36.841,0:04:42.067
Similarly, if I call index and give it character "e"
0:04:42.067,0:04:43.307
just for looking
0:04:43.307,0:04:46.557
but I give it the number 3 to start from
0:04:46.557,0:04:49.800
than it will find the letter "e"
0:04:49.800,0:04:51.016
lets see...
0:04:51.016,0:04:53.399
so this in the word "jumped"
0:04:53.399,0:04:55.333
there is character "e"
0:04:55.333,0:04:57.702
so that's what its going to be found
0:04:57.702,0:05:00.350
Obviously this 3 can be any number here
0:05:00.350,0:05:01.928
so if I can start from 5
0:05:01.928,0:05:04.466
and it will return the same value
0:05:04.466,0:05:06.666
Lets run this script
0:05:06.666,0:05:10.000
So, you can see that the values are:
0:05:10.000,0:05:15.838
10, -1, 0, 26, 2, 28, 18
0:05:15.838,0:05:17.681
this is the same value
0:05:17.681,0:05:19.807
It doesn't matter where you start exactly
0:05:19.807,0:05:21.415
as long as you start before
0:05:21.415,0:05:24.770
the appearence of the substring
0:05:24.770,0:05:27.132
And that's what index does
0:05:27.132,0:05:31.132
but there is also function called rindex
0:05:31.132,0:05:32.394
that does the same just looking
0:05:32.394,0:05:33.973
from the right hand side
0:05:33.973,0:05:35.690
It's interesting
0:05:35.690,0:05:38.430
if there're multiple apperances of the same string
0:05:38.430,0:05:42.676
and it will find the rightmost one of it
0:05:42.676,0:05:45.187
Running rindex on the same string
0:05:45.187,0:05:47.649
and giving it the substring "e"
0:05:47.649,0:05:49.507
than it will return 39
0:05:49.507,0:05:53.071
because it will find this letter "e"
0:05:53.071,0:05:56.216
And rindex can also have a third parameter
0:05:56.216,0:05:58.386
that will tell it where to start from
0:05:58.386,0:06:01.902
The position to start from looking for
0:06:01.902,0:06:03.666
So if I give it 38
0:06:03.666,0:06:06.666
than it will find this letter "e" here
0:06:06.666,0:06:07.951
returning 38
0:06:07.951,0:06:11.009
And if I start from 37 here
0:06:11.009,0:06:15.533
than it will, probably, find this "e" returning 33
0:06:15.533,0:06:17.346
So that's about index
0:06:17.346,0:06:20.552
The last one, which is probably the most interesting one
0:06:20.552,0:06:22.533
out of the functions that I'm showing now
0:06:22.533,0:06:24.507
is substring
0:06:24.507,0:06:27.800
Substring is actually the opposite of index
0:06:27.800,0:06:31.882
In index you would know what you're looking for
0:06:31.882,0:06:33.182
what string you're looking for
0:06:33.182,0:06:34.670
just don't know where it is
0:06:34.670,0:06:36.295
So you're looking for a location
0:06:36.295,0:06:38.478
In substring you know the location
0:06:38.478,0:06:40.288
you just don't know what is there
0:06:40.288,0:06:42.547
and you want to see what is in that location
0:06:42.547,0:06:45.699
Substring will get normally a string,
0:06:45.699,0:06:48.531
again this "The black cat climbed the green tree";
0:06:48.531,0:06:51.713
a number, which is an offset, so it is the location
0:06:51.713,0:06:54.266
to beginning of the substring we are looking for
0:06:54.266,0:06:56.379
in this original string,
0:06:56.379,0:07:01.348
again it's 0 based, so 0 would mean the first character
0:07:01.348,0:07:06.156
and 4 means this "b";
0:07:06.156,0:07:09.685
and the third parameter is the length
0:07:09.685,0:07:11.542
of the substring we are fetching
0:07:11.542,0:07:13.423
So because it has 5 characters
0:07:13.423,0:07:16.557
it will return the string, starting from this letter "b"
0:07:16.557,0:07:20.320
...5 characters, so that's the word "black"
0:07:20.320,0:07:22.733
Substring returns those values
0:07:22.733,0:07:24.768
and obviously I can assign it to...
0:07:24.768,0:07:31.233
what acually I showed you in lc and index example
0:07:31.233,0:07:34.158
... you can assign that value to some kind of variable
0:07:34.158,0:07:38.432
Here too, I'm just lazy so I'm just printing it out immediately
0:07:38.432,0:07:43.045
This call will return the word "black"
0:07:43.045,0:07:46.067
Now in the next example you can see
0:07:46.067,0:07:49.437
that the length is a negative number
0:07:49.437,0:07:51.733
So how would a negative number come there?
0:07:51.733,0:07:56.449
Instead of using that as a real length
0:07:56.449,0:08:00.583
what Perl does, is saying that this "4" means
0:08:00.583,0:08:03.415
how many characters from the left to leave off
0:08:03.415,0:08:05.180
and this number now says
0:08:05.180,0:08:08.333
how many characters from the right not to include
0:08:08.333,0:08:12.067
So, basically, it says count 4 from the left,
0:08:12.067,0:08:14.446
count 11 from the right
0:08:14.446,0:08:15.924
and whatever is in the middle
0:08:15.924,0:08:17.394
- that's what I want to get back
0:08:17.394,0:08:22.920
So that's how the "black cat climbed the" is returned
0:08:22.920,0:08:26.666
You can even leave off the whole third parameter at all
0:08:26.666,0:08:29.942
and that would mean in this case, as you can see:
0:08:29.942,0:08:34.532
just give me string from the position 14 to the end
0:08:34.532,0:08:38.326
because I didn't see how long it should take
0:08:38.326,0:08:42.461
So you just give it: from the position to the end
0:08:42.461,0:08:44.440
And here you can see,
0:08:44.440,0:08:47.265
that even the first parameter, the location
0:08:47.265,0:08:49.055
can be a negative number
0:08:49.055,0:08:51.726
and that, as you propably already guessed,
0:08:51.726,0:08:53.353
is counting from the end
0:08:53.353,0:08:57.594
So instead of saying it's a the length less 4
0:08:57.594,0:08:59.793
you just say less 4
0:08:59.793,0:09:03.994
and that means: start from this position
0:09:03.994,0:09:06.994
And because in this example here
0:09:06.994,0:09:09.793
I didn't give the number of characters
0:09:09.793,0:09:12.553
so, it will give me all the string till the end
0:09:12.553,0:09:16.373
that's just the 4 letters "tree"
0:09:16.373,0:09:19.076
Obviously, as you can see in this example,
0:09:19.076,0:09:21.619
you can give the third parameter as well
0:09:21.619,0:09:23.660
and than it will return you
0:09:23.660,0:09:27.594
the two characters from the last four
0:09:27.594,0:09:29.394
and that's "tr"
0:09:29.394,0:09:31.757
Let's just run it, so we can see
0:09:31.757,0:09:33.994
that's the "tr" here, right
0:09:33.994,0:09:36.326
And, if I change it a little bit,
0:09:36.326,0:09:40.040
for example put "1" here,
0:09:40.040,0:09:45.630
than run it and you can get only the "t"
0:09:45.630,0:09:49.421
And, if I would give "3" and run,
0:09:49.421,0:09:54.395
than I get "tre", so that you can understand it
0:09:54.395,0:09:57.247
The last example is a bit funky
0:09:57.247,0:10:00.462
So up untill now this call returned substring,
0:10:00.462,0:10:05.719
but the original $str, the original string didn't change
0:10:05.719,0:10:09.970
In the last example, the return value is still the same
0:10:09.970,0:10:13.447
it's defined by the first three parameters
0:10:13.447,0:10:15.616
but in the last example, this example
0:10:15.616,0:10:17.847
we also have fourth parameter,
0:10:17.847,0:10:23.457
which is, in this case, a string containing "jumped from"
0:10:23.457,0:10:26.309
And what happends is that, in this case,
0:10:26.309,0:10:31.663
Perl substring function will replace the string
0:10:31.663,0:10:36.180
that was defined by this three parameters
0:10:36.180,0:10:39.328
will replace that substring by the new string
0:10:39.328,0:10:49.447
So, because 14, 7 here defines the word "climbed" here,
0:10:49.447,0:10:55.796
the return value, in $z, will be "climbed",
0:10:55.796,0:10:59.796
but the $str will have a different content
0:10:59.796,0:11:04.734
and the word "climbed" will be replaced by the word "jumped from",
0:11:04.734,0:11:07.816
by the two words, substring actually
0:11:07.816,0:11:10.713
In this case $str will contain:
0:11:10.713,0:11:13.380
"The black cat jumped from the green tree"
0:11:13.380,0:11:15.970
And as you can see, Perl doesn't care that
0:11:15.970,0:11:20.221
the new string, new substring is actually longer
0:11:20.221,0:11:22.236
than the original one
0:11:22.236,0:11:24.955
it will just make enough space in the string
0:11:24.955,0:11:28.159
or if you leave it out, so for example
0:11:28.159,0:11:29.836
Let's just run it again
0:11:29.836,0:11:31.129
so you can see that
0:11:31.129,0:11:33.242
"The black cat jumped from the green tree"
0:11:33.242,0:11:34.513
that works
0:11:34.513,0:11:38.882
What happens, if I just put in the letter "j"
0:11:38.882,0:11:40.878
and run it? Than you will see that
0:11:40.878,0:11:43.334
"The black cat j the green tree"
0:11:43.334,0:11:45.498
whatever that would mean
0:11:45.498,0:11:49.534
It doesn't really matter, what the lengths are relative to each other
0:11:49.534,0:11:51.447
You can input an empty string,
0:11:51.447,0:11:53.939
but you have to have it here
0:11:53.939,0:11:59.275
and than Perl substring will cut out that part of the string
0:11:59.275,0:12:01.196
and leave, in this case, two spaces,
0:12:01.196,0:12:03.955
because the space before word "climbed"
0:12:03.955,0:12:05.713
and the space after
0:12:05.713,0:12:07.565
So, that's about substring
0:12:07.565,0:12:09.247
I hope you've learned something
0:12:09.247,0:12:11.165
and we will return to the next part
0:12:11.165,0:12:12.898
and if you are watching it on YouTube
0:12:12.898,0:12:15.196
please subscribe to the channel
0:12:15.196,0:12:17.446
Thank you, bye bye
Jump to Line
Something went wrong with that request. Please try again.